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Chapter  1 
Introduction 


1  he  desipier  of  a  VLSI  circuit  must  consider  not  only  functional  correctness 
but  timing  behavior.  Usually,  there  is  some  specification  of  how  quickly  the 
circuit  must  produce  its  output.  Once  a  schematic,  transistor-level  description  of 
the  circuit  is  produced,  it  must  be  forced  to  meet  the  delay  constraint.  This  is 
done  by  assigning  sizes  to  the  transistors. 

Note  that  this  is  a  different  problem  than  the  ratioing  of  transistor  sizes 
necessary  in  nMOS  and  some  CMOS  methodologies.  Those  considerations  involve 
waveform  shape  and  may  affect  the  circuit’s  correctness;  this  paper  only  deals 

with  the  speed  of  the  circuit. 

Increasing  the  size  of  transistors  in  a  VLSI  circuit  tends  to  decrease  the  delay 
through  the  circuit,  but  at  the  cost  of  increasing  its  area.  While  transistor  area  is 
usually  only  a  small  component  of  total  chip  area,  that  is  only  because  transistor 
sizes  are  usually  “reasonable.”  Minimizing  delay  can  result  in  huge  transistors; 
beyond  a  certain  point,  however,  larger  transistors  actually  increase  delay. 

Actual  minimization  of  the  circuit’s  delay  is  usually  not  required.  Instead,  the 
delay  must  be  reduced  to  meet  the  specified  constraint.  Given  a  delay  model, 
some  expression  for  maximum  delay  through  the  circuit  can  be  derived.  It  is  thus 
possible  to  view  the  problem  as  one  of  constrained  minimization: 

1)  minimize:  total  transistor  area 

subject  to:  actual  delay  <  delay  constraint 

Truly  minimizing  transistor  area  is  not  vital,  however;  in  fact,  any  “reasonable” 
solution  which  reduces  the  delay  below  the  constraint  will  be  acceptable.  Thus 
the  problem  can  also  be  cast  as 
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2)  minimize:  excess  delay  above  constraint 
subject  to:  reasonable  total  transistor  area. 

Note  that  only  excess  delay  is  being  minimized;  no  reward  is  given  for  reducing 
delay  below  the  constraint. 

Standard  non-linear  optimization  techniques  are  not  well  suited  to  these 
problems.  In  problem  1,  the  objective  function  is  quite  simple,  but  the  constraint 
is  both  highly  non-linear  and  expensive  to  compute  —  even  finding  a  feasible 
solution  is  very  difficult.  In  problem  2,  it  is  the  objective  function  which  is 
extremely  complex  and  difficult  to  deal  with.  The  major  difficulty  is  that  circuit 
delay  is  the  maximum  path  delay,  and  there  are  a  combinatoric  number  of  paths 
through  the  circuit;  furthermore,  path  delay  itself  is  an  extremely  complex 
function.  Previous  work,  frequently  involving  simplified  delay  models,  is  covered 

in  Chapter  2. 

Human  designers  avoid  considering  all  these  paths  by  using  intuition  and 
heuristics.  After  some  initial  configuration  is  chosen,  simulations  and  timing 
analyses  are  run  on  the  circuit  to  find  its  critical  paths  the  paths  through  the 
circuit  whose  delay  exceeds  the  constraints  —  and  the  designer  reduces  their  delay 
sufficiently.  Now  some  other  paths  may  be  critical,  so  the  process  iterates  until 
the  maximum  delay  through  the  circuit  is  satisfactory.  No  formal  attention  is 
paid  to  transistor  area;  presumably,  by  only  dealing  with  critical  paths, 
unimportant  transistors  will  be  left  at  minimum  size.  Such  critical-path 
heuristics  are  one  of  the  subjects  of  Chapter  3;  simpler  heuristics,  involving 
modifying  the  sizes  of  individual  transistors,  are  also  dealt  with. 

In  a  large  circuit,  however,  there  may  be  many  paths  each  requiring  more 
time  than  permissible;  if  an  iteration  of  the  critical-path  heuristic  is  required  for 
each  such  path,  the  total  computation  required  may  be  immense.  One  solution  is 
to  consider  more  than  one  critical  path  at  once;  this  unfortunately  leads  to 
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extremely  complicated  decisions,  involving  simultaneous  minimization  of  several 
equations.  Another  approach  is  to  work  with  the  entire  circuit  at  once  by  using  a 
probablistic  hill-climbing  technique  —  such  as  simulated  annealing  —  in  the  hope 
that  the  cost  function  can  be  chosen  so  that  the  process  will  reduce  the  delay  on 
many  paths  simultaneously.  This  alternate  tactic  is  considered  in  Chapter  4. 

MOST  (Method  for  Ordering  and  Sizing  Transistors)  is  a  Prolog  program 
which  makes  use  of  the  information  supplied  by  PTA  (the  Prolog  Timing 
Analyzer)  in  conjunction  with  either  heuristics  or  a  simulated  annealing  algorithm 
to  assign  sizes  to  the  transistors.  In  contrast  to  most  previous  work,  it  sizes 
transistors  without  guidance  from  the  designer;  there  is  no  need  to  specify  which 
paths  to  examine,  for  example.  The  two  programs  together  are  approximately 
1500  lines  long,  or  350  clauses;  the  source  code  is  included  as  an  appendix. 
MOST's  design  and  implementation  are  described  in  Chapter  5. 

The  current  version  of  PTA  uses  a  simple  lumped  RC  delay  model  to  find  the 
maximum  delay  at  and  critical  path  to  each  signal  within  a  circuit.  An 
interesting  feature  of  PTA  is  its  abUity  to  provide  symbolic  equations  for  the 
resistance  and  capacitance  (and  hence  delay)  at  each  node.  The  details  of  PTA 
and  its  implementation  are  presented  in  Chapter  6. 

Throughout  the  paper,  fragments  of  Prolog  code  are  included  to  illustrate 
some  of  the  algorithms  being  described.  This  code  is  almost  invariably  an 
oversimplification,  but  much  clearer  than  the  complete  implementation.  In 
particular,  questions  of  eflSciency  -  either  of  storage  or  computation  -  are 

ignored. 

Detailed  results  for  the  various  approaches  are  presented  in  chapter  7,  but 
Table  0  provides  a  short  summary.  All  CPU  times  throughout  the  paper  are  for  a 
VAX  785  running  interpreted  Cprolog. 
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Terminologj' 

The  usual  statement  of  a  problem  is  “Assign  sizes  to  the  components  of 
circuit  X  so  that  all  its  outputs  are  produced  by  time  T.”  T  is  called  the 
maximum  delay  or  constraint,  and  the  entire  process  is  called  sizing  the  circuit. 
The  actual  delay  through  the  circuit  is  the  delay  given  a  particular  assignment  of 
sizes  to  its  components,  while  the  size  or  area  of  a  circuit  is  the  sum  of  the 

component  areas. 

A  circuit  is  described  hierarchically  in  terms  of  cells-,  the  particular  data 
structure  used  is  called  a  constrained  hierarchical  schematic,  and  so  the  terms 
cell  and  CHS  are  used  iaterchangeably.  A  cell  is  either  a  primitive  cell,  or  it  is 
made  up  of  sulcells.  Tra.isistors  are  usually  regarded  as  being  the  primitive 
elements,  but  most  techniques  apply  equally  well  if  logic  gates  or  even  macro  cells 

are  taken  as  primitives. 

Since  the  components  of  the  circuit  may  be  cells,  rather  than  transistors, 
sizing  a  circuit  may  involve  sizing  the  cells.  This  is  a  rather  unfortunate  choice  of 
phrasing;  sizing  a  cell  does  not  mean  deriving  its  maximum  bounding  box,  but 
rather  assigning  sizes  to  the  primitives  in  its  substructure, 

A  related  problem  is  that  of  minimizing  delay  through  the  circuit.  In  this 
case,  there  is  no  explicit  delay  constraint;  the  goal  is  to  make  the  circuit  nm  as 
fast  as  possible.  Additionally,  there  may  be  an  area  constraint,  or  some 
maximum  permissible  transistor  area. 

Computing  the  delay  through  the  circuit  is  the  job  of  the  timing  analyzer. 
Essentially,  what  needs  to  be  done  is  find  each  path  by  which  an  input  to  the 
circuit  can  affect  an  output,  and  then  take  the  maximum  of  all  these  path  delays. 
The  path  with  the  longest  delay  is  referred  to  as  the  critical  path. 

A  path  will  be  made  up  of  several  stages.  A  stage  (the  term  is  borrowed 
from  Ousterhout  [14])  is  a  chain  of  transistors  from  a  driving  source  (usuaLV  an 
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input  to  the  chip)  to  a  use  of  the  signal  —  either  as  an  output  of  the  chip,  or  as  a 
gate  to  another  transistor.  A  stage  usually  corresponds  to  a  path  through  a  logic 
gate  and  its  associated  pass  transistors. 

All  three  of  the  boxed  areas  in  the  diagram  below  are  stages. 


Chapter  2 

Path  Sizing  and  Previous  Work 

A  more  restricted  form  of  the  problem  only  considers  a  single  path  instead  of 
the  entire  circuit.  Techniques  for  path-sizing  do  not  generalize  well,  for  two 
related  reasons;  the  larger  size  of  an  entire  circuit,  and  the  additional  complexity 
caused  by  the  possibility  of  multiple  paths  through  the  circuit.  Much  previous 
work  has  been  done  on  this  problem,  however,  and  it  makes  a  good  introduction 
to  the  more  general  case.  Moreover,  it  can  be  an  useful  component  of  a  general 
solution,  especially  in  conjunction  with  critical-path  heuristics. 

Path  Sizing 

In  most  previous  work,  the  path  is  viewed  as  being  made  up  of  logic  gates, 
rather  than  individual  transistors.  Furthermore,  most  authors  assume  that  signals 
generated  by  gates  in  the  path  are  not  used  anywhere  but  their  successor,  and 
that  inputs  similarly  do  not  come  from  outside  the  path.  These  very  restrictive 
assumptions  allow  a  variety  of  approaches  —  summarized  nicely  by  Matson  [10] 
—  to  be  succo^ful. 

Comparing  these  results  is  diflScult,  largely  because  of  the  paucity  of  statistics 
provided  by  the  various  authors.  Table  1  at  the  end  of  this  section  summarizes  as 
well  as  possible,  leaving  question  marks  for  figures  not  provided. 

The  most  obvious  approach  is  simply  to  use  an  already-existing  general 
purpose  optimization  package  along  with  a  highly  accurate  timing  analyzer  such 
as  SPICE.  This  turns  out  to  be  impractical:  too  much  time  is  spent  in  simulation. 
A  particular  problem  is  that  symbolic  derivatives  are  not  available,  and  so  must 
be  computed  numerically  at  great  expense.  Matson  gave  results  for  using  the 
DELIGHT  package  along  with  SPICE  in  [10],  but  only  as  a  contrast  to  the 

efficiency  of  his  work. 
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One  way  of  avoiding  the  high  computational  cost  of  such  an  approach  is  to 
use  a  simplified  model  for  transistors.  At  some  cost  in  accuracy,  this  saves  greatly 
in  computation,  especially  if  symbolic  derivative  information  can  be  calculated. 

When  using  the  simple  RC  model,  it  is  possible  to  derive  the  equations  for 
delay  in  terms  of  the  transistor  sizes,  and  then  solve  these  by  a  quasi-Newton 
method.  Consider  the  critical  path  as  made  up  of  n  stages,  each  of  which  m  turn 
drives  the  next  stage,  and  let  D;  be  the  delay  of  stage  i.  Then 

Dtot=EI>. 

i 

and,  due  to  the  lumped  RC  model 

Di=R.*C.  (*) 

Now  let  T  t'O  ^11  the  transistors  making  up  stage  i;  if  Rother  capture  the 

interconnect  and  output  resistances  and  capacitances,  then 

R;—  ER^t~bRother 

uT 

Ci=E^t+^other 

uT 

The  resistance  of  a  transistor  is  inversely  proportional  to  its  size  the 
capacitance,  directly.  Using  this  fact,  equation  {*)  can  be  rewritten  as 

D,=(E  -^+Rother)*(E^2*^t+Cother) 
tcT  t‘T 

All  the  Di's  can  be  summed  to  give  an  equation  for  the  total  delay,  and  this 
equation  can  be  minimized  or  set  to  a  particular  value.  This  task  is  greatly 
simplified  by  the  ease  of  computing  the  partial  derivatives. 

One  fact  not  immediately  obvious  in  the  above  description  is  that  the  Cither 
of  one  stage  may  involve  the  sizes  of  transistors  in  the  next  stage.  One 
component  of  C,,her  is  the  load  capacitance,  which  includes  the  gate  capacitance 
of  whatever  gate  in  the  next  stage  being  driven  by  the  current  stage.  This  means 
that  the  problem  is  not  truly  separable:  stages  cannot  be  treated  independently. 
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It  is  not  clear  how  easy  it  is  to  perform  optimization  even  given  the  sunplified 
model:  it  depends  on  how  many  variables  (or  transistors)  are  involved  in  the 
equations.  Individual  stages  are  likely  to  be  short  -  Ousterhout  [14]  claims  that 
most  are  only  two  or  three  transistors  in  length  —  but  the  critical  path  ma> 
consist  of  a  large  number  of  stages.  In  a  32-bit  processor,  for  example,  the  critical 
path  is  likely  to  be  the  carry  chain  through  the  ALU,  which  will  have  at  least  32 

stages. 

Several  authors  use  variations  on  this  approach.  Glasser  and  Hoyte  [3]  model 
the  delay  on  a  path  as  the  sum  of  the  gate  delays.  This  model  ignores  the  shape 
of  the  input  waveforms,  but  Glasser  and  Hoyte  argue  that  its  estimates  are 
accurate  within  30?^.  Each  gate  is  modeled  as  a  capacitor  and  a  resistor,  and 
their  program  minimizes  the  equation  for  maximum  delay  using  relaxation 
techniques  in  order  to  find  the  proper  “scale  factor”  for  each  gate. 

Hedlund’s  EO  [5]  (for  Electrical  Optimizer)  can  either  minimize  delay  or 
minimize  power  consumption  with  bounds  on  delay.  It  deak  with  several  paths 
simultaneously,  as  well  as  both  polarities  of  input  on  a  single  path,  by  minimizmg 
(over  the  set  o;  assiEnment>  to  transistors)  the  maximum  (over  the  paths  and 
polarities)  delay.  In  other  words,  if  Dp(S)  is  the  delay  for  assignment  S  on  path  P, 

EO  computes 

min(maxDp(S)) 

p 

The  maximum  is  approximated  by  the  continuous  function  smax  (for  “smoothed 
maximum”;  see  Ruehli  et.  al.  [19]),  where 

smax(xi_  •  •  •  ,Xn)=  •  ’  ’  +e^“) 

The  minimization  is  done  by  a  quasi-newton  non-linear  optimization  method. 

Another  way  of  potentially  lessening  computation  is  to  use  heuristics  instead 
of  non-linear  optimization  techniques.  Since  the  problem  is  rather  structured,  and 
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the  optimal  solution  (in  this  case,  the  absolutely  minimum  transistor  size)  is  not 
required,  it  may  be  possible  to  capitalize  on  this  structure  via  heuristics  as  a 
human  designer  does.  Note  these  are  heuristics  for  sizing  a  single  path,  as  distinct 
from  critical-path  heuristics  for  sizing  the  entire  circuit. 

Kao,  Fathi,  and  Lee  [7]  use  an  extremely  simple  heuristic:  at  each  step,  the 
gate  contributing  the  most  to  delay  relative  to  its  current  area  is  increased  in  sue. 
This  “scapegoat  heuristic”  makes  no  attempt  to  capture  the  complex  interactions 
of  all  the  gates  within  the  path,  but  they  claim  that  performance  is  satisfactor}’ 
even  for  relatively  large  circuits. 

Trimberger's  Andy  [21]  uses  the  “ramped-driver”  heuristic.  Each  gate  Ls 
divided  into  several  stages  which  increase  in  size  by  a  fixed  fan-out  factor  in 
order  to  drive  the  (presumably)  large  capacitive  load  at  the  output  of  the  gate. 
For  each  gate,  the  capacitive  load  is  computed,  and  then  an  equation  for  the 
proper  number  of  stages  for  the  gate  is  solved.  The  capacitances  are  computed 
starting  at  the  end  of  the  circuit,  and  then  the  program  works  backwards,  sizing 
each  gate  as  it  goes  (this  is  done  because  the  output  capacitance  on  gate  i  depends 
on  the  input  capacitance  of  gate  i-fl). 

Trimberger  claims  that  the  ramped-driver  heuristic,  although  it  does  not 
minimize  delay,  is  desirable  because  in  general  it  requires  less  power  and  smaller 
area.  Additionally,  he  says,  it  is  closer  to  how  human  designers  attack  the 

problem. 

Lee  and  Soukup  (9)  take  a  similar  approach,  first  solving  for  the  optimal 
number  of  stages,  then  optimizing  the  stage  sizes  (as  opposed  to  Trimberger's 
fixed  fan-out  factor).  They  also  discuss  the  minimization  of  area;  given  a 
constraint  on  the  delay,  they  use  Lagrange  multipliers  to  solve  the  optimization 
problem.  However,  they  quote  no  statistics  on  the  efficiency  of  their  program. 
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Matson  (10]  argues  that  heuristics  are  in  general  less  efficient  than  non-linear 
optimization  methods,  particularly  ^vhen  the  delay  constraint  is  near  the 
minimum  delay  achievable  by  the  circuit.  Additionally,  he  claims,  the  accuracy  of 
the  timing  models  is  insufficient  for  high-performance  VLSI  design.  On  the  other 
hand,  a  general  non-linear  optimizer  fails  to  take  advantage  of  the  structure  of 
the  problem.  As  a  result,  he  uses  a  special-purpose  optimizer  in  conjunction  with 
a  timing  model  [11]  of  intermediate  complexity. 

The  particular  optimization  problem  Matson  attacks  is  minimizing  power 
subject  to  a  con:iraint  on  maximum  delay,  but  the  technique  applies  equaDy  well 
to  minimizing  transistor  area.  In  both  cases,  the  objective  function  is  separable:  it 
is  the  sum  of  contributions  from  each  of  the  individual  components  (either 
macrocells,  gates,  or  transistors)  along  the  path.  If  the  delay  is  also  regarded  as 
the  sum  of  individual  contributions  then  it  too  is  separable.  This  is  not  strictly 
true,  due  to  the  effects  of  waveform  ^ape  and  the  interactions  between  input  and 
output  capacitances,  but  is  a  useful  assumption. 

Using  the  method  of  duality,  a  variation  on  Lagrange  multipliers,  Matson 
takes  advanta?'.  of  the  no..i-separability  of  area  and  delay  by  dividing  the 
minimization  into  a  minimization  of  each  cell  in  succession.  Instead  of  one 
minimization  over  a  very  large  vector  space,  many  minimizations  over  small 
vector  spaces  are  performed  instead,  and  since  the  cost  of  non-linear  minimization 
grows  combinatorically,  the  divide-and-conquer  method  greatly  speeds  up  the 


process. 
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Table  1 

Path  sizinx:  comparison  of  previous  work 

Aiforilhm 

Circuit 

Size 

Reduction 

Machine 

CPU  time 

deugkt/spice  lioi 

Inverter  Chain 

6 

f 

_ ! - 

VAX  11/750 

3151.7 

Relaxation  [3] 

Inverter  Chain 

SCO 

t 

DEC  20/60 

50 

Quasi-^f'w'ton  [5] 

Control  Logic 

fi 

20 

63^ 

62% 

VAX  11/750 

0.1 

2.3 

Scapegoat  [7] 

1-bit  adder 

13 

t 

XEROX  1108 

20 

Ramped  Driver  |21] 

PLA 

t 

40% 

DEC  20/60 

t 

Dualilv  |10] 

Inverter  Chain 
4-bit  adder 

6 

76 

t 

t 

DEC  20/60 

16.3 

522.3 

Generalisrit'cn 

Most  o'  'hc^e  techniques  consider  only  a  chain  of  gates,  rather  than  a  path  at 
the  transistor  level,  and  ignore  the  possibility  of  outside  inanences.  The  presence 
of  pass  transistors  complicates  the  issue.  It  b  not  sufficient  to  site  the  transbtor 
whose  output  is  the  gate  of  the  the  next  transbtor  in  line;  other  tran^tors 
connected  to  that  transbtor  may  need  to  be  sized  as  well,  Heurbtics  will  have 
more  difficulty  in  thb  situation,  especially  since  a  single  transbtor  may  be 
connected  to  several  different  transistors  in  the  critical  path. 

Most  authors  also  gloss  over  the  question  of  minimum  widths  of  trandstors. 
Heuristics  which  only  increase  the  size  of  transbtors  cause  no  difficulty  here,  of 
course,  but  all  non-linear  minimizations  must  actually  be  constrimed 
minimizations.  Strictly  speaking,  the  transbtor  widths  must  abo  be  integers  (or 
integer  multiples  of  some  Bxed  lambda);  most  techniques  simply  round  off  in  a 
post-processing  phase  to  deal  with  this  difficultj . 

With  such  modifications,  these  approaches  can  be  used  for  path  sizmg  m 
critical-path  heuristics.  The  ramped-driver  heuristic  does  not  really  apply  to  this 
case,  however,  since  it  is  not  desirable  to  add  new  stages. 

An  important  question  is  how  well  these  techniques  generalize  to  the  case  of 
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an  entire  circuit.  The  next  chapter  considers  expanding  the  “scapegoat  heurBtic” 
to  the  entire  circuit.  Matson’s  duality  technique,  however,  does  not  generalize  as 
nicely.  The  divide-and-conquer  nature  of  the  technique  would  make  it 
particularly  appropriate  for  large  circuits,  so  it  is  worth  examining  just  fcow  it 
breaks  down. 

The  key  to  being  able  to  divide  the  optimization  problem  is  to  be  abk  to 
sepal  ate  the  total  path  delay  into  individual  cell  delays.  Circuit  delay  can  iedeed 
be  broken  down  into  the  individual  cells  on  the  critical  path,  but  as  transistors  are 
sized,  the  critical  path  changes.  In  other  words,  the  decomposition  is  differmt  at 
different  times  in  the  process.  Matson  only  deals  with  specific  paths,  which 
remain  the  same  throughout  the  analysis. 

Fishbum  and  Dunlop  have  recently  published  some  impressive  work.  They 
have  shown  that,  given  an  RC  delay  model,  circuit  delay  is  a  convex  function  of 
transistor  sizes  (so  far,  they  have  been  unable  to  generalize  their  result  to  slope 
delay  modek).  The  pertinent  feature  of  convex  functions  is  that  any  local 
minimum  is  in  fact  a  global  minimum.  This  in  turn  implies  that  simdated 
annealing  or  multiple  initial  configurations  are  not  required  to  avoid  local  minima. 

The  approach  used  by  their  TILOS  program,  however,  is  in  fact  a  slight 
variation  on  a  scapegoat  heuristic  described  below.  The  primary  value  of  their 
result  appears  to  be  in  the  confirmation  that  heuristics  are  in  general  “good,” 
rather  than  providing  any  algorithmic  method  for  solving  the  problem. 

Simulated  annealing  is  still  a  potentially  viable  technique.  As  wZl  be 
discussed  in  chapter  4,  the  cost  function  is  not  necessarily  the  maximum  delay 
through  the  circuit.  A  more  complex  cost  function  —  for  example,  the  smoothed 
maximum  of  all  the  path  delays  —  may  not  have  the  convex  property,  but  may 
still  be  a  more  accurate  measure  of  how  close  to  a  solution  the  configuration  is.  In 
addition,  simulated  annealing  techniques  wUl  still  apply  in  the  case  of  non-convex 


delay  functions,  which  may  arise  from  a  more  accurate  delay  model. 


Chapter  3 

Heuristic  approaches 

In  the  absence  of  an  algorithmic  solution,  it  is  natural  to  search  for  viable 
heuristics  —  all  the  more  so  since  this  is  how  human  designers  currently  attack 
the  problem.  The  usual  course  of  such  heuristics  is  to  perform  an  analysis  of  the 
circuit,  giving  such  information  as  the  maximum  delay  through  the  circuit  and  the 
critical  pnth.  and  then  use  this  information  to  guide  the  next  resizing  step. 

size (Circuit. CoBBtiaint) 

analyze (Circuit. Delay. Info) , 

res; ze. if_neces9ary(Circuit. Delay. Constraint. Info) . 

resize  if  necessaryCCircuit. Delay. Constraint. Info) 

Delay  <=  Constraint.  X  done! 

resize, if _  necessary (Circuit . Delay . Constraint . Info) 

Delay  >  Constraint, 
apply. heuristic  (. . .). 
size (Circuit. Constraint) . 

Two  criteria  can  be  applied  to  heuristics:  efficiency  and  optimality.  Optimality  is 
simply  a  measure  of  how  close  to  the  optimum  performance  circuit  the  heuristic 
can  come.  Although  the  absolutely  optimum  circuit  is  not  required,  if  a  heuristic 
can  not  even  approach  it  with  consistency,  it  is  not  particularly  useful. 

Assuming  a  heuristic  does  give  reasonable  results,  efficiency  measure  how 
quickly  it  does.  Some  heuristics  may  be  extremely  efficient  for  reducing  delay  up 
to,  say,  40 /c.  but  extremely  inefficient  beyond  that.  A  major  factor  in  efficiency  is 
the  number  of  timing  analyses  required,  so  many  of  the  different  approache  are 
attempts  to  reduce  thb  number.  Potentially,  however,  the  amount  of  work  done 
to  avoid  reanalysis  may  actually  become  the  dominant  factor. 

Heuristics  fall  into  two  major  categories:  transistor-level  and  critical-path. 
Transistor-level  heuristics  work  with  one  transistor  at  a  time.  A  timing  analysis  is 
performed,  and  then  one  transistor  is  resized.  The  advantage  of  this  scheme  is  its 
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simplicity;  in  general,  no  complicated  equations  need  to  be  solved,  and 
interactions  between  critical  paths  do  not  need  to  be  considered.  The 
corresponding  disadvantage  is  the  lack  of  efficiency,  particularly  if  a  full  timing 
analysis  needs  to  be  performed  after  sizing  each  transistor.  Compromise  schemes 
involving  sizing  several  transistors  before  performing  another  analysis  avoid  this 
problem,  but  only  at  the  cost  of  increasing  complexity. 

Critical-path  heuristics  mirror  human  designers’  strategies.  The  critical  path 
through  the  circuit  is  found,  and  then  sized  so  that  it  meets  the  delay  constraint. 
The  process  is  repeated  until  all  path  delays  have  been  reduced  sufficiently.  This 
reduces  the  nur.-.ber  of  analyses  of  the  circuit,  but  problems  now  arise  due  to  the 
potential  interaction  of  crilioal  paths:  if  a  transistor  is  on  two  different  paths, 
what  should  its  size  be? 

To  reduce  the  number  of  analyses  even  further,  more  than  one  critical  path 
can  be  sired  before  re-analyzing.  Either  all  the  paths  are  considered 
simultaneously,  or  some  form  of  iteration  is  performed  without  reanalysis.  This 
approach  compounds  the  difficulties  of  interaction,  and  potentially  increases  the 
complexity  of  the  peth-sizing  problem.  On  the  other  hand,  it  can  drastically 
reduce  the  number  of  analyses,  particularly  when  many  paths  are  critical 

simultaneously. 

Mixed  approaches,  combining  the  above  heuristics,  are  also  possible.  For 
example,  a  critical-path  heuristic  may  be  combined  with  a  transistor-level 
heuristic  for  transistors  not  on  the  critical  path.  Alternatively,  one  heuristic  may 
follow  another;  a  heuristic  considering  multiple  critical-paths  may  be  used 
initially,  and  once  the  number  of  critical  paths  is  reduced  sufficiently,  a  standard 
critical-path  heuristic  may  take  over.  Currently,  very  little  work  has  been  done 


in  this  promising  area. 
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Transistor-level  Heuristics 

The  two  important  questions  at  this  level  are  what  transistor  to  resize  and 
how  to  do  the  resizing.  There  are  several  ways  to  choose  what  transistor  (or 

transistors)  to  resize. 

Most  obvious  is  random  choice.  This  is  extremely  simple,  and  fast  to 
compute,  but  results  in  an  unacceptably  large  number  of  timing  analyses. 

As  simple  generabzation  of  the  “scapegoat  heuristic**  in  which  the  transistor 
coulribuliug  the  most  delay  is  resized  is  quite  simple  to  implement. 

applj- heuristic (Traneifitorfl.EquationB) 

chcose. scapegoat (Sizes .Equations , Culprit) , 
adjust. size (Culprit) . 

Once  again,  this  computation  is  not  too  expensive;  on  the  other  hand,  it  is  difficult 
to  say  just  what  **contributing  the  most  delay”  means.  In  the  final  configuration, 
some  transistors  will  still  contribute  more  delay  than  others,  and  eventuaUy  the 
point  of  diminishing  returns  is  reached  for  an  individual  transistor:  even  though  it 
contributes  much  of  the  delay,  it  is  better  to  leave  it  that  way  and  resize  another 

transistor  instead. 

One  way  of  avoiding  this  problem  is  by  instead  examining  the  change  in 
delay  resulting  from  resizing  each  transistor.  In  essence,  this  approach  works  with 
the  derivatives  rather  than  the  delay  function  itself.  The  derivatives  can  be 
computed  symbolically,  or  some  numeric  approximation  may  be  used  instead.  A 
simple  and  useful  approximation  to  the  derivative  is  the  change  in  cost  given  a 
unit  change  in  the  size  of  the  transistor. 

Of  course,  there  b  no  need  to  restrict  these  techniques  to  a  single  transbtor. 
More  than  one  transbtor  can  be  resized  in  each  pass.  The  above  approaches  can 
all  be  generalized  in  very  straightforward  ways  to  consider  multiple  transistors. 
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This  leads  to  methods  in  which  every  transistor  contributing  more  than  a  specified 
amount  of  delay  is  increased  in  size,  or  where  every  transistor  whose  resizing 

would  decrease  delay  is  resized. 

Dealing  with  multiple  transistors  simultaneously  is  almost  always 
advantageous.  Very  little  additional  work  needs  to  be  done,  since  the  effect  of 
each  change  needs  to  be  computed  in  order  to  find  the  best  change,  and  the 
number  of  analyses  is  almost  always  reduced.  The  one  way  in  which  this  decision 
can  be  harmful  is  if  the  interrelationships  between  transistors  are  too  great,  and 
resizing  one  transistor  affects  the  decisions  about  whether  to  resize  others.  This 

can  lead  to  oscillation. 

There  are  several  ways  to  decide  how  much  to  adjust  the  size  of  the 
transistors.  Simplest,  and  surprisingly  efficient,  is  simply  increasing  the  size  of  the 
transistor  by  one  unit.  The  fact  that  other  transistor  sizes  will  continue  to  change 
reduces  the  advantage  of  more  complicated  schemes,  such  as  solving  for  the 
optimal  size  given  the  current  size  assignments  to  other  transistors. 

Critical-path  Heuristics 

The  basic  idea  of  finding  the  critical  path  and  then  resizing  it  is  simple 
enough  to  implement. 

apply  heuristic (Circuit, Crit. path, Constraint, Delay) 
resize (Critical.path, Constraint, Delay) . 

One  question  is  how  to  do  the  path  sizing.  Three  of  the  approaches  discussed  in 
chapter  2  are  worth  considering; 

1)  Individual  transistor  heuristic,  in  which  a  single  transistor  at  a  time  is 
increased  in  size. 

2)  Numeric  solution  of  the  path  delay  equations  for  optimal  sizes. 
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3)  A  simplified  numeric  solution,  given  some  assumptions  about  the 
eventual  sizes. 

Each  of  these  methods  is  more  complicated  and  time-consuming  than  the 
ways  of  sizing  individual  transistors  discussed  above.  From  an  efficiency 
standpoint,  then,  the  question  is  whether  the  additional  computation  done  here  is 
compensated  by  a  corresponding  decrease  in  the  number  of  timing  analyses 

required. 

The  real  difficulty  in  critical-path  heuristics  is  the  interaction  between 
different  paths.  What  is  to  be  done  if  a  transistor  has  had  a  size  assigned  to  it  in 
the  course  of  sizing  a  path,  and  then  is  a  component  of  another  critical  path 

considered  later. 

There  are  two  possible  strategies  tor  dealing  with  this  situation.  One  is  to 
allow  each  transistor  to  be  sized  only  once;  once  it  has  a  size  assigned  to  it,  it  is 
fixed.  Thus  L«  quite  simple  to  implement,  but  may  not  be  sufficient.  Consider  the 

following  circuit; 


— [> — 

i _ 

1 

X 

— 1> — 

[>- 

large  capacitive  load 


long  inverter  chain  © 


In  the  case  of  a  chain  of  identical  dates,  the  raraped-driver  heuristic  provides 
optimal  solutions.  The  proper  size  assignment,  though,  depends  on  the  length  of 
the  path.  Assume  both  paths  exceed  the  delay  constraint,  and  further  assume 
that  the  lower  (longer)  path  is  the  most  critical.  When  it  is  sized,  inverter  X  will 
have  some  relatively  small  size  assigned  to  it.  Now  the  upper  path  is  still  critical; 
to  reduce  its  delay,  inverter  X  will  have  to  be  increased  in  size. 
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The  other  method  allows  transistors  to  be  resized  as  many  times  a5  necessary. 
However,  resiriog  a  transistor  affects  the  delays  along  paths  which  have  previously 
been  sired,  potentially  requiring  once  again  siring  those  paths.  This  in  torn 
creates  the  possibility  of  a  loop  between  two  paths,  where  siring  one  undoes  the 
effect  ot  siring  the  other.  In  terms  ot  the  above  example,  when  the  sire  of 
transistor  X  is  increased  to  reduce  the  delay  on  the  upper  path,  the  lower  path's 

delay  will  be  increased. 

It  might  seem  that  Prolog’s  backtracking  mechanism  offers  an  elegant 
solution  to  this  problem;  assign  a  set  of  sizes  to  a  path,  and  if  no  global 
assignment  c^n  be  reached  satisfying  the  delay  constraints,  simply  backtrack  and 
size  the  path  differently.  This  is  undeniably  theoretically  possible,  but  in  practice 
extremely  inefficient.  Backtracking  throws  away  all  the  information  gained,  and 
so  there  Ls  no  way  to  know  what  caused  the  failure  or  how  next  to  size  the  path. 
Although  asoert  could  be  used  to  keep  the  information,  so  much  would  need  to 
be  asserted  that  the  Cprolog  interpreter,  at  any  rate,  would  not  allow  it. 

It  is  possible  to  set  up  relationships  among  transistor  sizes  along  a  path  so 
that  resizing  one  of  the  transistors  implicitly  causes  the  resizing  of  all  the  others. 
Techniques  from  the  AI  community  such  as  access  demons  might  be  used  for  thb 
problem:  whenever  one  transistor’s  size  is  modified,  the  demon  could  change  other 
sizes  as  necessary.  Once  again,  however,  there  is  the  potential  for  loops  in  which 
two  transistors’  sizes  mutually  depend  on  each  other.  Thb  would  seem  to  require 
a  rather  general  symbolic  mathematics  package  for  solving  simultaneous  non¬ 
linear  equations  at  each  step. 

Multiple  critical  paths  can  be  considered  simultaneously.  As  more  and  more 
paths  are  considered,  the  improvements  made  at  each  stage  are  potentially 
greater,  but  the  amount  of  computation  that  needs  to  be  done  also  increases. 


Chapter  4 

Simulated  Annealing 


The  basic  algorithm 

Simulated  annealing  [8]  is  a  probablistic  hill-chmbing  algorithm.  It  differs 
from  standard  hill-climbing  in  that  a  new  configuration  may  be  accepted  even  if  it 
increases  the  cost;  the  chance  of  this  occurring  is  controlled  by  a  parameter  called 
the  temperature,  which  steadily  decreases.  This  prevents  getting  trapped  m  a 
local  mini'num  due  to  an  unfortunate  choice  of  initial  configuration. 

The  algorithm  divides  into  an  outer  loop,  which  gradually  decreases  the 
teiiiperature.  and  an  inner  loop,  which  performs  a  number  of  iterations  at  each 
temperature.  At  each  iteration,  a  new  configuration  is  generated,  its  cost  is 
evaluated,  and  then  the  acceptance  function  determines  whether  or  not  the 
configuration  is  accepted.  Any  configuration  decreasing  the  cost  will  of  course  be 
accepted;  different  acceptance  functions  give  different  chances  of  accepting 
configurations  which  increase  the  cost.  The  usual  acceptance  function,  used 

throughout  this  paper,  is 

acceptance  chance  =  exp( - — ) 

The  algorithm  can  terminate  in  two  ways;  it  succeeds  if  delay  is  reduced 
below  the  desired  goal  (although  this  success  may  be  postponed  briefly  in  order  to 
minimize  the  sizes  of  the  transistors),  and  it  fails  if  some  failure  criterion  is  met. 
A  standard  failure  criterion  is  no  change  in  the  configuration  after  a  certain 
number  of  times  through  the  main  loop. 
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anneal (Circuit, Constraint) 

initialire  (Conf iguration. Delay, Temperature) . 

outer. loop (Conf iguration, Constraint, Temperatur  ) . 

outer,  loop  (Conf  iguration, Del^, Constraint,!) 

Delay  =<  Constraint.  a  success 

outer  loop (Conf ig. Delay, Constraint,!) 

Delay  >  Constraint, 

iinl^:ioop(2:ConIig,Co.t.T,He..coi.fig,N...del.y) 

generate (Conf ig . Test. conf ig) , 

coBt(!eet.config.!est.coet) , 
accept (Tee t. cos t , Cos t , !) , 

K1  is  N-1,  ^  T 

inner  loop (Nl, Test. conf ig. Test  cost,!. 

New  conf ig. New. delay) . 

inner. loop(N.Config.Cost,T.New.config.New. delay) 

iiner  loop(Nl.Config. Cost.!. New. config.New.delay) . 


In  terms  of  the  particular  problem  being  attacked,  a  configuration  is  simply 
an  assignment  of  sires  to  the  transistors.  Ne»  configurations  are  generated  by- 
random  perturbations  of  each  sire;  by  restricting  these  perturbations  to  be 
integers,  -ae  assure  that  the  final  transistor  size  is  also  integral.  The  cost  of  a 
configuration  includes  a  penalty  for  exceeding  the  specified  delay,  and  another 
term  relating  to  the  total  size  of  the  transistors  (in  order  to  keep  the  circuit  from 

growing  too  large). 

Computing  the  cost  involves  computing  the  actual  delay  through  the  circuit. 
The  PTA  timing  analyzer  takes  a  given  configuration  and  finds  the  delays  of  all 
the  nodes.  A  large  amount  of  additional  information  is  supplied  as  «ell:  the 
transistor  causing  the  maximum  delay  for  each  node,  allowing  critical  paths  to  be 
recreated,  and  symbolic  equations  for  the  resistance  and  capacitance  of  each 
transistor.  PTA  uses  a  depth-first  search  algorithm,  ensuring  that  nodes  will  not 
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be  reprocessed,  and  a  simple  RC  model  for  simplicity  aad  speed.  Despite  this,  it 
consumes  the  bulk  of  the  program’s  time;  for  a  lOO-lransistor  circuit,  for  example, 
timing  analysb  requires  over  10  epu  seconds. 

Many  parameters  can  be  varied  in  an  attempt  to  improve  performance. 
Among  these  are  the  initial  temperature  and  configuration,  the  rate  at  which  the 
temperature  decreases,  the  proper  number  of  iterations  at  a  given  temperature, 
the  acceptance  chance,  and  the  generation  procedure.  Much  theoretical  research 
has  been  done  in  this  area,  but  so  far  none  of  these  results  have  been  incorporated 

into  this  work. 

Cost  functions 

A  major  determining  factor  in  the  performance  of  the  algorithm  is  the  cost 
function.  Several  different  functions  have  been  tried,  aU  revolvmg  around  the 
idea  of  charging  a  penalty  for  a  delay  exceeding  the  constraint.  If  the  desire  were 
simply  to  reduce  circuit  delay  to  a  minimum,  then  the  penalty  could  just  be  the 
delay;  since  the  problem  is  only  to  meet  a  specified  criteria,  though,  no  bonus  is 
given  for  reducing  delay  below’  this  bound. 

Penalty  =  max(Delay— Constraint,©) 

Initially,  this  was  the  entire  cost  function.  Since  it  ignored  transistor  sizes,  it  led 
to  very  large  circuits. 

The  first  cost  function  still  weighted  the  maximum  delay  through  the  circuit 
much  more  heavily  than  the  total  size. 

Cost  =  10*Penalty+TotalSize 

The  process  essentially  divides  into  two  steps:  first  the  sizes  of  the  transistors 
increased  as  delay  is  reduced  to  the  constraint,  and  then  the  total  size  component 
of  the  cost  takes  over,  and  the  sizes  are  gradually  reduced.  A  satisfactory  solution 
is  usually  reached,  but  rather  slowly,  since  essentially  only  one  critical  path  is 

being  considered. 
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Since  one  of  the  justiScations  for  using  simulated  annealing  was  the 
possibility  of  dealing  with  multiple  paths  simultaneously,  the  next  improvement 
was  to  consider  all  critical  paths  in  the  cost  function. 

Cost  =  5*Penalty^„+5*  Penaltyi+2*TotalSize  (2 

ifnodes 

The  most  critical  path  is  weighted  more  heavily  than  others,  since  it  is  still  the 
primary  limitation  on  the  circuit. 

The  third  cost  function  weights  the  sizes  more  heavily. 

Cost  =  2*Penalty^^+  Penaltyi+Total  Size  (3 

ifnodes 

For  each  of  these  three  cost  functions,  maximum  permutation  sizes  of  1,  2,  and  4 
were  tried  in  turn.  Results  are  summarized  in  Table  2. 


- - - - 

Table  2 

Cost  function  performance 
faveraee  of  two  runs,  48-transistor  circuit) 

cost 

function 

maximum 

perturbation 

reduct 

requested 

Jon 

achieved 

cpu  time 
(seconds) 

size 

increase 

1 

1 

35% 

38% 

441 

114% 

2 

35% 

38% 

153 

129% 

4 

35% 

38% 

153 

204%. 

1 

50% 

‘44%, 

824 

150% 

2 

50% 

50% 

537 

276% 

4 

50% 

*47% 

637 

415% 

9 

1 

35% 

38%. 

297 

114% 

2 

35% 

39% 

155 

188% 

4 

35% 

39% 

396 

282% 

1 

50% 

50% 

972 

203% 

2 

50% 

50% 

485 

227% 

4 

50% 

*45% 

845 

351% 

1 

35% 

35% 

254 

150% 

2 

35% 

35% 

596 

210% 

4 

35% 

38% 

155 

321% 

1 

50% 

50% 

791 

240% 

2 

50% 

50% 

595 

304% 

4 

50% 

*25%. 

204 

152% 

♦  —  failure 

The  only  clear  result  is  that  a  maximum  permutation  of  2  is  the  best  choice;  no 
obvious  indication  as  to  the  most  desirable  cost  function  is  apparent.  Once  delay 
reductions  beyond  50%  are  requested,  however,  cost  functions  weighting  delay 
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much  more  heavily  than  size  are  required  to  obtain  solutions. 

Obviously,  the  major  bottleneck  of  the  program  is  the  time  required  to 
analyze  the  circuit.  As  described  above,  the  algorithm  requires  an  analysis  for 
each  configuration,  and  then  discards  all  the  information  except  the  delays.  Using 
this  extra  information  to  avoid  some  analyses  can  result  in  large  performance 

gains. 

Near  the  solution,  most  new  configurations  will  be  rejected;  in  fact,  most -are 
“obviously  wrong”  in  that  they  increase  the  critical  path  delay  sharply.  The  goal 
is  to  screen  the  “obviously  wrong”  configurations  by  using  approximate  timing 
analysis  and  avoid  fully  analyzing  them.  This  idea  is  similar  to  the  one  suggested 

by  Greene  and  Supowit  [4]. 

Consider  the  delay  along  the  critical  path.  If  the  new  configuration  increases 
the  delay  on  the  critical  path,  it  will  certainly  increase  the  maximum  delay 
through  the  circuit  as  a  whole.  Conversely,  if  a  configuration  decreases  the 
critical  path  delay,  it  will  probably  decrease  the  delay  through  the  circuit  as  a 
whole.  Thus,  analysis  of  the  effect  of  a  change  on  the  critical  path  is  “almost”  as 
useful  as  analysis  of  the  circuit  as  a  whole,  and  —  since  only  one  path  needs  to  be 
considered  —  much  quicker. 

Since  PTA  provides  symbolic  equations  for  delay  at  each  node,  all  that  needs 
to  be  done  is  evaluate  these  equations  with  the  new  gate  sizes  included.  This  is 
substantially  faster  than  performing  a  complete  timing  analysis  (see  table  Y). 

Instead  of  just  computing  the  path  delay,  the  cost  of  the  new  configuration 
given  the  previous  delay  equations  b  calculated.  Thb  computation  b  still 
substantially  quicker  than  reanalyzing  the  circuit. 


26 


Table  3 

1  Circuit  Analysis  vs  Ex^uation  Evaluation 

evaluation 

evaluation 

size 

analysis 

(delay  equation) 

(cost  equations) 

ratio 

~"6 

13.6 

8 

1.38 

17.3 

24 

2.15 

30.7 

48 

0.15 

30.1 

96 

10.60 

0.07 

0.48 

22.1 

A  standard  acceptance  test  is  performed  on  this  estimated  cost;  if  it  passes,  then  full 
analysis  and  another  acceptance  test  occur.  To  avoid  biasing  the  algorithm  against 
couhgurations  which  increase  the  cost  (since  they  now  must  pass  two  tests)  the  same 
random  nuiah-r  is  used  for  both  acceptance  tests. 

Greene  and  Supowit  view  the  screening  process  as  biasing  the  generation 
function;  I  prefer  to  consider  it  as  a  simple  preliminary  cost  function.  In  either 
case,  the  effect  is  the  same.  If  a  new  configuration  cannot  pass  this  test,  it  can  be 
rejected  without  doing  a  complete  analysis  of  it.  Table  9  in  Chapter  7  presents  a 
comparison  of  the  same  cost  function  with  and  without  screening. 

Note  that  this  prediction  function  is  not  perfectly  accurate.  This  differs  from 
the  situation  considered  by  Greene  and  Supowit;  a  configuration  may  pass  the 
screening  test  only  to  be  rejected.  However,  a  configuration’s  actual  maximum 
delay  can  only  be  greater  than  the  screening  function  predicts,  so  no  potentially 
acceptable  configurations  are  ever  eliminated  in  the  preliminary  stage.  As  a 
result,  the  theoretical  properties  of  the  algorithm  are  unaffected. 

Combining  Heuristics  and  Simulated  Annealing 

One  promising  areas  of  exploration  is  the  integration  of  simulated  annealing 
with  other  heuristics  for  sizing.  A  major  problem  with  heuristic  approaches  is 
that  it  is  difficult  to  say  ahead  of  time  which  solution  is  desirable  for  a  critical 
path  —  it  may  be  important  to  size  some  transistors  larger  than  they  would 
otherwise  need  to  be  due  to  their  effect  on  other  paths.  If  simulated  annealing  is 
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combined  with  a  critical-path  sizing  heuristic,  the  heuristic  can  generate  an 
acceptable  solution  and  then  rely  on  simulated  annealing  to  find  the  proper 

solution. 

One  obvious  way  of  combining  the  two  approaches  is  to  alternate  them:  aDow 
the  annealing  algorithm  to  run  for  a  time,  reduce  the  critical  path  delay  using  a 
heuristic,  and  repeat  the  process.  Another  potential  method  is  to  use  a  heuristic 
in  the  generation  function  of  the  annealing  algorithm,  with  some  random 
perturbations  thrown  in.  More  simply,  a  heuristic  may  be  used  to  give  a  startmg 
configuration.  Finally,  a  post-processing  heuristic  may  be  used  to  improve  the 
solution  gen -crated  by  annealing. 

One  particular  area  in  which  such  a  post-processing  heuristic  might  be  useful 
is  in  reducing  transistor  sizes.  Since  area  minimization  is  less  important  than 
reducing  the  delay  to  the  constraint,  transistors  not  on  the  critical  path  tend  to  be 
larger  than  they  need  to  be.  Detecting  and  then  examining  these  transistors  is 
certainly  mere  efficient  than  allowing  the  annealing  algorithm  to  continue. 


Chapter  5 
MOST 

The  MOST  program  allows  the  various  approaches  to  transistor  sizing  to  be 
tested.  It  consists  of  PTA  (the  Prolog  Timing  Analyzer),  various  front  ends 
corresponding  to  different  heuristics,  and  a  simulated  annealing  interface.  MOST 
is  not  only  a  test  program  for  these  methods,  but  is  a  CAD  tool  m  its  own  right. 

MOST  is  designed  as  a  component  of  the  ASP  (Advanced  Silicon  Compiler  in 
Prolog)  project.  Compatibility  with  ASP  is  an  absolute  requirement;  this 
determines  the  implementation  language  and  interface  conventions  for  MOST. 
Furthermore.  MOST  is  tuned  to  its  use  within  ASP. 

ASP 

The  goal  of  the  ASP  Project  is  to  produce  a  high-performance  silicon 
compiler  tuned  to  the  development  of  a  logic  processor.  For  a  fuUer  description  of 
ASP,  see  McGeer  et.  al.  [12].  From  the  viewpoint  of  MOST,  the  salient  feature  of 
ASP  is  that  it  defines  a  single  interface  for  all  its  component  programs:  the 
constrained  hierarchical  schematic  (CHS).  MOST  takes  its  input  in  this  format, 
and  simply  attaches  additional  constraints  to  the  schematic. 

A  CHS  contains  a  listing  of  inputs  and  outputs,  as  well  as  additional 
constraints  not  important  within  our  framework.  Functionally,  a  CHS  may  be 
either  a  primitive  (a  transistor,  for  example)  and  its  associated  structure  (in  this 
case,  the  source,  gate,  and  drain  signals,  as  well  as  the  gate  size)  or  a  collection  of 
subcells,  each  of  which  in  turn  is  a  CHS.  The  hierarchical  nature  of  the  CHS 
allows  a  buUding-block  approach  to  silicon  compilations,  and  the  notion  of 
constraints  interacts  well  with  the  Prolog  language. 

Within  ASP,  MOST  is  meant  to  be  run  before  layout  takes  place.  This 
implies  that  the  exact  lengths  of  the  interconnections  are  not  known,  and  so  some 
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estimates  have  to  be  made  on  the  parasitic  resistances  and  eapactances. 
Although  not  a  restriction  on  the  program  -  if  exact  values  are  available  they 
can  be  used  -  this  is  the  norma!  situation,  and  so  algorithms  are  designed  with  it 
in  mind.  In  particular,  the  timing  analyser  currently  uses  the  computationally 
simple  but  less  accurate  lumped  RC  model  for  delay.  The  justilieation  for  this  is 
that  since  the  uncertainty  of  the  parasitics  Umits  the  accuracy  of  any  delay 
computations,  there  is  no  point  in  spending  extra  effort  to  arrive  at  similarly 


inaccurate  results. 

The  choice  of  Prolog  as  the  implementation  language  (necessary  for 
compatibility  with  the  rest  of  ASP)  strongly  influences  the  design  of  MOST.  The 
implementation  of  the  simulated  annealing  algorithm  makes  great  use  of  the 
delayed  h.nd.ny  and  tacktr.cki.g  provided  by  Prolog,  In  effect,  implicit  pointers 
throughout  the  data  structures  allows  many  variables  to  be  equated,  and  then 
binding  one  sets  all  the  values  simultaneously.  When  a  procedure  fails,  however, 
any  assignments  are  undone. 

These  two  features  are  used  for  substituting  the  next  configuration  into  the 
CHS.  The  variables  in  the  CHS  are  collected  in  a  pre-processing  stage,  and  then 
at  each  iteration  thb  list  of  variables  is  unified  with  the  list  of  sizes  making  up  the 


next  configuration. 

Instead  of  using  delayed  binding,  the  configuration  could  be  substituted  into 
the  CHS  simply  by  traversing  the  entire  strueture.  The  cost  of  this  traversal  is 
small  compared  to  the  cost  of  timing  analysis,  but  it  can  still  be  substantial;  a 
straightforward  implementation,  done  for  comparison's  sake,  required  over  .8  epu 
seconds  for  a  circuit  of  96  transistors.  For  the  same  circuit,  the  unification  takes 

less  than  .1  epu  second. 
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Tmplement«.tlon 

The  algorithms  used  by  MOST  have  already  been  described.  The  two  major 
ways  in  which  MOST  differs  from  the  pseudo-code  provided  above  are  in 
attention  to  storage  management  and  efficiency. 

The  thorniest  implementation  problem  was  that  of  iteration.  Both  simulated 
annealing  and  the  heuristics  fall  nicely  into  the  paradigm  of 

iterate (Circuit. Configuration) 

evaluate (Circuit. Configuration) . 
adjust (Configuration. New. configuration) . 
iterate (Circuit , New. configruation) . 

The  question  is  how  to  do  this  gracefully  within  the  framework  of  a  language  that 
has  no  destructive  assignment.  The  evaluation  step  consists  of  binding  the 
transistor  sizes  to  the  current  configuration  and  then  performing  a  circuit  analysis. 
When  it  cones  time  to  evaluate  the  next  configuration,  however,  the  transistor 
sizes  are  no  longer  unbound  variables! 

Short  of  using  a  technique  such  as  diflerence  lists  (which  mimics  destructive 
assignment  substantiallj'  less  efficiently)  the  only  alternative  U  to  make  use  once 
again  of  Prolog’s  backtracking. 

iterate (Circuit . Conf iguration)  : - 

evaluate (Circuit .Current. configuration) . 
adjust (Conf iguration. New. configuration) . 

fail . 

The  fail  unbinds  the  variables,  so  that  they  can  be  rebound  for  the  next 
evaluation. 

The  difficulty  in  this  approach  is  made  clear  in  the  call  to  evaluate;  just 
what  configuration  U  being  evaluated?  The  backtracking  also  throws  away  the 
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bindiDg  or  New.  configuration.  The  only  way  to  retain  this  necessary 
information  is  to  assert  it. 


iterate (Circuit. Configuration) 

assert (Scurrent (Conf iguration) ) . 

retract ($current (Current. configuration) ) , 

evaluate (Circuit . Current. conf iguration) . 

ad  j  us t (Current. configuration .New. conf iguration; , 

assert (Scurrent (New. configuration) ) , 

fail . 


AJthourb  any  use  of  assert  violates  Prolog  s  logical  paradigm,  this  method  is 
fairly  reasonable,  assert  is  not  being  used  to  communicate  between  procedures, 
only  to  retain  information  over  backtracking  within  a  single  clause.  From  an 
efficiency  standpoint,  relatively  little  —  less  than  1%  —  of  the  program’s  tune  is 
spent  in  tbLs  assertion  and  retraction,  despite  the  fact  that  somewhat  more 
information  than  just  the  configuration  needs  to  be  retained;  both  the  heuristics 
and  the  screening  function  of  simulated  annealing  need  the  critical  path  equations 
from  the  previous  configuration. 

The  symbolic  mathematic  portion  of  MOST  can  evaluate,  simpUfy,  and  take 
derivatives  of  equations.  The  simplifier  is  rather  mediocre;  it  does  not  deal  with 
the  distributive  law,  for  example.  Its  main  purpose  is  to  remove  zeroes,  and  the 
main  requirement  is  that  it  be  fast,  so  adding  in  more  complicated  laws  would  be 

counter-productive. 

Similariy.  the  equation  evaluator  needs  to  be  fast.  The  equations  being 
evaluated  can  include  unbound  variables,  which  default  to  zero,  so  the  standard 
Prolog  is  function  cannot  be  used.  Analysis  showed  that  the  is  is  a  factor  of  10 
faster  than  a  symbolic  evaluator,  primarily  due  to  the  necessity  for  unifying  and 
setting  up  a  new  environment  at  each  level  of  the  expression  tree;  since  evaluation 
costs  ate  a  significant  factor  in  the  overall  time,  this  penalty  is  unacceptable. 
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Table  4 

U  t>« 

.  evaluation 

circuit 

sire 

equation 

sire 

evaluation 

time 

IS 

time 

ratio 

6 

37 

0.06 

0.02 

3.0 

8 

109 

0.17 

0.02 

8.5 

24 

119 

0.21 

0.02 

10.5 

48 

263 

0.42 

0.03 

14.0 

96 

551 

0.90 

0.06 

15.0 

Wba.  really  Beads  to  be  done  is  to  define  an  additional  operator,  say 

default,  such  that 

default(X)  =  0  if  X  is  unbound 

default(X)  =  X  otherwise. 

Then  is  can  be  used  Ircely  simply  by  apply, ng  default  to  all  the  ponteutially 
unbound  variables.  However,  Cprolog  does  not  permit  the  definition  of  new 

arithmetic  operators. 

The  eventual  solution  was  to  prepare  a  modified  version  of  the  interpreter  in 
which  unbound  variables  default  to  0.  This  is  a  hideous  hack,  but  the 
performance  gains  are  well  worthwhile.  The  program  will  still  run  in  standard 
Prologs,  how  ever,  due  to  the  use  of  macro  definitions  of  the  relevant  procedures. 

The  modified  interpreter  asserts  the  fact  Ifitst. interpreter  in  its 
environment,  and  so  programs  are  able  to  test  to  see  which  version  of  the 
interpreter  is  in  use.  Using  the  expand,  term  preprocessor,  the  changes  can  be 
implemented  in  a  completely  transparent  manner.  When  a  program  is  being  read 
in.  expand,  term  is  applied  to  each  clause  (this  is  also  how  grammar  rules  are 
implemented).  11  no  expand. term  succeeds,  then  the  clause  is  simply  asserted  as 
is;  otherwise,  the  second  argument  of  the  appropriate  expand,  term  is  asserted. 
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X  if  Condition  ie  true,  aseert  the  Then  clanee 

expand. tennCifdcf (Condition. Then. Else). Then) 

Condition. ! . 

%  otherwiee,  the  Else  clause 

expand.tennCifdef  (Condition. Then. Else)  .Else)  - 

♦  Condition. ! . 


ifdef ($f ast. interpreter, 
(duumy  (...)  :  ~ 

X  ie  Eq 

...). 

X  otherwise 

(duasBy  (...)  " 

evaluate (Eq,X) , 

...)). 


The  modification  to  the  interpreter  greatly  increases  MOST's  speed.  Table 
10  in  chapter  7  contains  the  statistics;  the  gain  is  always  at  least  a  factor  of  two. 


Chapter  6 
PTA 

PTA  b  a  vital  component  of  MOST;  it  provides  information  for  both  the 
heuristic  and  the  simulated  annealing  top  ends.  Although  it  is  scarcely-  an 
advance  on  the  state  of  the  art  -  it  is  both  slower  and  somewhat  less  accurate 
than  Crystal,  for  example  -  it  has  several  interesting  features.  In  particular,  it  is 
tuned  to  repeated  use  as  part  of  a  sizing  program,  and  thus  preprocesses  as  much 
of  its  input  as  possible;  it  orders  series  transistors  if  their  order  is  mitially 
unknown;  it  provides  symbolic  equations  for  delay  at  the  various  nodes;  and  it 
capitalizes  on  the  hierarchical  structure  of  the  input  schematic. 

It  is  possible  to  modify  PTA  to  use  a  more  accurate  delay  model,  so  the  lack 
of  accuracy  not  inherent.  PTA  also  has  the  ability  to  treat  higher  levels  of 
abstractions  -  logic  gates  or  even  macro  cells  -  as  primitives  if  the  proper  delay 
model  b  provided.  Currently,  however,  MOST  does  not  take  advantage  of  this 

capability. 

An  obvious  question  is  the  necessity  of  designing  a  new  timing  analyzer. 
Why  not  just  use  Crystal,  for  example,  with  (if  necessary)  a  few  modifications? 
One  objection,  of  course,  is  that  Crystal  (or  any  other  timing  analyzer)  b  not 
written  in  Prolog;  however,  the  aesthetic  desire  for  a  system  written  entirely  in 
Prolog  b  not  sufficient  justification  for  reinventing  the  wheel. 

In  order  to  understand  the  reasons  for  writing  PTA  from  scratch,  it  is 
necessary  to  understand  its  functions.  The  justification  will  thus  be  postponed 
until  after  a  description  of  PTA  and  its  implementation. 

Implementation 

PTA  takes  a  CHS  as  its  input.  The  output  of  PTA  is  the  same  CHS,  with 
additional  timing  information  attached;  the  information  is  sufficient  to  reconstruct 
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the  critical  path  to  aity  node  within  the  circuit.  In  addition,  any  initUlly 
uDordered  series  traDsistors  will  have  orders  attached. 

If  the  CHS  has  subcells,  then  each  subcell  is  analyzed  in  turn.  The  delay  at 
an  output  ot  the  CHS  is  simply  the  maximum  of  the  delays  ot  that  output  m  the 
subcclls.  This  process  is  repeated  recursively  until  a  primitive  element  is  reached. 
In  the  standard  version  of  PTA,  a  primitive  is  a  collection  of  transistors 
connecting  a  single  input  to  a  single  output. 

In  order  to  process  a  primitive  element,  the  delays  of  all  its  inputs  must  Srst 
be  known.  This  may  involve  first  processing  other  CHSs  whose  outputs  are  inputs 
to  the  current  CHS;  since  the  information  is  retained,  this  does  not  cause  any 
additional  work,  just  reorders  the  schedule.  Each  path  to  each  input  will  thus  be 
considered.  Once  the  input  delays  are  known,  a  delay  modeler  for  the  primitive 
element  is  called  on  to  calculate  the  output  delays. 

analyze. che (CHS) 

ie.  primitive (CHS) , 

known. input. delays (CHS) , 
process. primitive (CHS) . 

analyze. chs (CHS) 

+  is  primitive (CHS) , 

eubc'ells  (CHS. Cells) . 

apply_  to. each (analyze. chs , Cells) . 


This  scheme  has  several  benefits.  In  the  first  place,  it  permits  any  level  to  be 
viewed  as  a  primitive,  as  long  as  the  required  delay  modeler  is  supplied.  For 
example,  rather  than  going  down  to  the  transistor  level,  logic  gates  might  be 
considered  primitive.  Secondly,  it  capitalizes  on  the  hierarchical  structure,  which 
limits  the  number  of  paths  through  any  one  cell.  Finally,  it  eases  the  burden  on 
the  delay  modeler,  which  b  able  to  assume  that  all  the  input  delays  are  known. 

The  delay  model  currently  used  for  transistors  is  the  lumped  RC  model, 
^^-hich  views  the  entire  resistance  and  capacitance  of  a  stage  as  concentrated  at 
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the  end  ot  the  stage.  Clearly,  this  model  U  pessimistic;  however,  it  is  reasooably 
accurate,  and  computationalh  fast. 

Each  transistor  in  turn  is  considered  as  the  trigger  transistor,  or  the  last 
transisto.-  to  change.  The  delay  on  the  stage  given  this  choice  of  trigger  transistor 
is  then  computed,  and  the  maximum  of  these  delays  is  taken  as  the  output  delay. 
The  trigger  transistor  is  also  recorded,  allowing  later  reconstruction  of  the  critical 

path. 

\T>D 


GND 

Transistors  X  and  Y  are  the  possible  choices  for  trigger  transistor. 

If  the  stage  consists  of  the  set  of  transistors  X,  each  with  its  associated 

resistance  and  capacitance;  interconnections  I;  and  drives  capacitive  load 

then  the  delay  with  t  as  trigger  is 

D,=Input  delaVt+(X;Rx+ERi+^m)*(S^*'''^^‘‘''^°“'^ 

'  '  xcX  id 

^•here  “>’•  means  “follows  in  the  path.”  In  terms  of  the  above  diagram,  and 
neglecting  interconnect  resistance  for  simplicity, 

Cout=CA+CB 

Dx=In  p  u  t  d  el  ay  XIN+  y)  *  ( Cx+C\'+ 

D^-=Input  delayYiN+(Rin+Rx+RY)*(^'^^out) 
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If  the  primitive  element  contains  nnordered  series  transUtors,  they  are 
ordered  before  this  computation  takes  place.  Having  known  input  delays  allows 
this  ordering:  otherwise,  the  final  order  of  the  transistors  U  not  know  when  the.r 
delay  is  calculated,  and  some  assumption  must  be  made.  The  only  safe 
assumption  is  the  worst-case  one  tor  each  transistor,  but  this  leads  to  wildly 

pessimistic  results. 

Determining  the  input  resistance  and  output  capacitance  of  a  primitive 
element  will  involve  tracing  a  path  through  the  circuit.  The  path  must  come 
from  an  input  to  the  circuit  to  the  input  of  the  CHS  being  considered;  in  the  case 
of  a  tranststor,  the  path  goes  to  the  source.  Finding  paths  is  a  classic  prolog 
pseudo-bread, h-first  search  problem;  the  simplistic  implementation  is  quite 

straightforward; 


pathCX.X.ip.  ^ 

pattCX.Y.lconnectionCX.Z)  IPp  • 
connection (X, Z) < 
path(Z,Y.P2)  . 


The  use  of  connection  is  meant  to  hide  the  exact  structure  of  CHSs;  X  and  Y  are 
connected  if  there  is  a  primitive  CHS  with  X  as  an  input  and  Y  as  an  output.  In 
practice,  however,  this  simple  algorithm  is  not  sufficient,  since  some  additional 

checking  needs  to  be  done. 

A  cycle  among  CHSs  implies  a  memory  node  or  a  latch.  In  this  case,  the 
cyclic  path  is  ignored.  This  check  is  potentially  rather  time-consuming,  if  paths 
are  long,  since  it  uses  an  order  N"'  algorithm.  In  practice,  however,  it  is  relatively 


inexpensive. 

In  order  to  avoid  considering  paths  which  are  blocked  by  non-overlapping 
signals,  the  lUt  of  signals  infiuencing  each  path  is  retained.  Only  signals  which 
have  been  specified  as  potentially  non-overlapping  are  included  in  this  list.  All 
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values  are  kept  for  each  set  of  signals. 

In  effect,  this  mechanism  trades  the  space  for  storing  all  the  different 
combinations  for  the  time  required  to  do  recomputation  if  each  case  is  considered 
separately  as  it  is  in  a  traditional  timing  analyzer.  Potentially,  storage  can 
increase  combinatorially  with  the  number  of  non-overlapping  signals.  Most  paths, 
however,  do  not  involve  more  than  one  or  two  such  signals,  so  the  price  is  not  too 
great.  Furthermore,  it  is  exactly  these  paths  where  the  values  computed  can  be 

used  for  multiple  cases. 

With  these  additions,  the  path  algorithm  is  somewhat  more  complex. 


path (X.X, Signals. Path, Path) . 

path  (X . Y .  Sipals . Path.  BO,f ar .Path)  ; - 
connection (X.Z) , 

%  check  for  circularities 
+  iteniber  (connection (X. Y)  .Path. so. far) , 
X  check  for  overlapping  signals 
signals (connection (X. Y) , S) . 
overlap (S. Signals) . 

add. signals (S. Signals. New. signals) . 
path  (Z.Y, New.  signals,  [connection  (X.Y) 


Path. BO. far], Path) . 


PTA  is  tuned  to  its  mode  of  use  within  MOST;  repeated  analyses  of  the  same 
schematic  with  additional  sizes  attached.  As  a  result,  before  the  first  analysis  of  a 
given  circuit,  it  does  as  much  preprocessing  as  possible,  since  preprocessing  costs 
only  need  to  be  paid  once.  All  the  paths  within  each  individual  CHS  are 
computed  and  stored  (the  use  of  hierarchies  keeps  this  space  requirement  from 
growing  exponentially),  as  are  symbolic  formulas  for  each  node’s  output 
capacitance.  For  a  100-transistor  circuit,  preprocessing  requires  8  cpu  seconds;  by 
comparison,  the  rest  of  the  analysis  only  takes  13  seconds.  The  same  algorithm 
with  the  preprocessing  removed  requires  27  cpu  seconds,  substantially  more  than 


the  sum  of  the  two  times. 
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Justification 

The  strongest  argument  in  favor  of  a  completely  new  timing  analyzer  b  the 
absence  of  any  Prolog  timing  analyzer.  This  is  not  just  an  aesthetic  argument. 
Due  to  the  nature  of  the  interpreter,  interfacing  problems  are  particularly 
daunting,  h  is  relatively  easy  to  make  use  of  a  C  procedure  which  returns  a 
numeric  val^e  simply  by  adding  a  hook  to  the  interpreter  allowing  function  calls, 

but  returninig  a  structure  is  far  more  diflBcult. 

In  the  first  place,  Prolog  procedures  do  not  return  values,  but  rather  uniiy 
them  with  their  arguments.  This  problem  requires  only  “syntactic  sugar"  to 
avoid,  but  the  two  languages  represent  structures  differently;  Prolog  structures 
need  to  be  converted  into  C’s  format  when  the  procedure  is  called,  then  the  C 
structures  must  be  massaged  to  convert  them  into  the  proper  Cprolog  format. 
Finally,  Prouog  variables  are  fundamentally  different  than  C  variables  (there  is  no 
Prolog  anaJog  to  assignment,  for  example,  and  pointers  are  implicitly 
dereferenced);  it  is  not  at  all  clear  how  to  remedy  this  difficulty. 

Beyond  these  language  issues,  we  reach  the  question  of  how  much  an  enstmg 
timing  analyzer  would  need  to  be  modified  to  fill  its  role  as  part  of  MOST.  For 
concreteness.  Crystal  is  considered.  At  least  four  areas  need  to  be  dealt  with: 

1)  MOST  requires  symbolic  delay  equations  from  the  timing  analyzer.  It  is 
possible  to  add  these  to  Crystal  in  much  the  same  way  they  have  been 
implemented  in  the  current  version  of  PTA;  to  correspond  to  unbound 
variables  in  the  Prolog  version,  pointers  to  unfilled  memory  locations 

co^nld  be  used  in  C. 

2)  PTA  must  order  series  transistors  in  the  cases  where  the  order  is  not 
fixed  ahead  of  time.  No  facility  for  unordered  lists  is  present  in  Crystal, 
and  even  if  this  were  added  there  would  still  be  major  difficulties.  PTA 
ordy  considers  a  CHS  once  all  its  input  delays  are  known;  given  this 
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information,  it  is  possible  to  decide  on  an  ordering  for  the  transistors. 
Crj-stal  does  not  make  such  a  stipulation,  so  the  ordering  information 
may  not  be  known  or  —  even  worse  —  may  change  as  the  circuit  is 

being  analyzed. 

3)  In  the  ASP  environment,  transistor  sizing  takes  place  before  layout. 
This  implies  that  the  exact  interconnect  capacitances  are  not  known, 
and  some  estimates  must  be  made.  This  would  be  rather  simple  to  do 
within  the  framework  of  Crystal. 

4)  Crystal  cannot  deal  with  non-overlapping  signals.  A  human  designer 
can  do  case  analysis  by  fixing  on  each  possibility  in  turn;  this  results  in 
significant  recomputation,  however.  Once  again,  this  capability  could  be 
added  into  Crystal  in  the  same  way  it  has  been  implemented  in  PTA. 

Most  of  these  features,  then,  could  be  added  into  the  existing  framework  of 
Crystal.  On  the  other  hand,  these  areas  consumed  the  bulk  of  time  implementing 
PTA,  and  it  seems  fair  to  assume  that  as  much  time  would  have  been  required  to 
modify  Crystal.  The  final  decision  on  whether  to  use  Crystal  was  that  the 
implementation  difficulties,  unordered  transistors,  and  the  aU-Prolog  aesthetic 
argument  outweighed  the  already  existing  speed  and  accuracy  of  Crystal. 

In  retrospect,  substantially  more  time  than  expected  was  spent  implementing 
PTA;  however,  almost  all  of  this  time  was  spent  in  the  areas  which  would  have 
had  to  be  added  to  Crystal  as  well.  PTA  is  substantially  slower  than  Crystal,  and 
because  of  its  choice  of  the  lumped  RC  timing  model,  somewhat  less  accurate. 
Despite  this.  I  believe  the  decision  was  a  good  one. 

PTA’s  accuracy  can  be  improved  by  incorporating  the  distributed  RC  model 
and  taking  waveform  shape  into  account.  These  gains  will,  of  course,  be  limited 
by  the  fact  that  interconnect  lengths  are  only  estimates,  but  should  make  PTA  s 
accuracy  competitive  to  other  timing  analyzers.  Furthermore,  from  a 
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development  standpoint,  there  was  a  great  advantage  in  being  able  to  work  with 
the  relatively  simple  equations  of  the  lumped  RC  model.  The  extra  accuracy  of 
the  slope  model  in  particular  is  accompanied  by  a  dramatic  increase  in  complexity 

of  the  equations. 

Admittedly,  PTA  is  at  least  an  order  of  magnitude  slower  than  Crystal,  and 
for  some  choices  of  algorithms  this  is  the  Umiting  factor  for  the  MOST  program. 
On  the  other  hand,  this  difference  is  simply  due  to  the  fact  that  Cprolog  is 
interpreted  Estimates  for  the  performance  of  the  PLM  machine  [l],  m 
conjunction  with  the  Berkeley  Prolog  compiler  [18],  predict  a  200-fold  increase  in 

speed. 


Chapter  7 
Results 

Five  algorithms  have  been  evaluated. 

SINIPLE 

—  a  simple  scapegoat  heuristic:  the  size  of  the  “most  useful”  transistor 
(the  transistor  whose  modification  does  the  most  good)  is  increased  by 
one 

—  the  size  of  the  most  useful  transistor  is  increased  by  a  varying  amount 

_  the  size  of  any  transistor  whose  increase  would  reduce  delay  is 

increased  by  one 

CP  _  a  critical  path  heuristic  which  uses  partial  derivative  information  for 

the  path  sizing 
ANNE.\L 

—  simulated  annealing  using  screening  and  the  cost  function 

Cost  =  5*Penalty^„+5*  X)  Penaltyj+TotalSize 

irnodes 

Due  to  the  random  nature  of  this  algorithm,  there  is  a  fair  amount  of 
uncertainty  in  the  results  quoted  for  ANNEAL,  most  of  which  are  based 
on  only  a  few  runs. 

The  first  question  to  be  considered  is  how  well  the  various  algorithms 
perform  on  two  mid-sized  circuits.  A  one-bit  full  adder  consisting  of  24  transistors 
is  a  small  enough  circuit  that  all  the  algorithms  perform  reasonably  well. 
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Table  5 

Algorithm  performance  —  1-bit  adder 
(24  transistors) 

algorithm 

delay  re( 
requested 

Auction 

achieved 

size 

increase 

timing 

analyses 

cpu  time 
(seconds) 

Sl.MPLE 

3orc 

50% 

60% 

31% 

51% 

60% 

13% 

63% 

131% 

7 

31 

64 

18.7 

80.3 

163.0 

MS 

30%, 

50% 

60% 

44% 

50% 

63% 

67% 

152% 

298% 

6 

13 

24 

16.4 

34.4 
63.1 

MT 

30% 

50% 

60% 

31% 

50% 

60% 

19% 

77%. 

148% 

3 

11 

20 

8.7 

29.4 

52.8 

CP 

30% 

50% 

60% 

33% 

51% 

60% 

25% 

94% 

127% 

3 

9 

15 

15.3 

47.2 

80.7 

ANNEAL 

30% 

50% 

60% 

42% 

52% 

60% 

152% 

218%, 

194% 

6 

11 

84 

23.3 

40.3 
230.9 

Doubling  the  size  of  the  circuit  to  two  bits  and  48  transistors  causes  problems 
for  the  SEMPLE  heuristic  and  the  simulated  annealing  algorithm.  The  results 
reported  by  Fishburn  and  Dunlop  are  included  for  comparison;  the  time  is  for  a 
68000-based  workstation  running  C  code. 


Table  6 

Algorithm  performance  —  2-bit  adder 
(48  transistors) 

delay  rec 
requested 

iuction 

achieved 

size 

increase 

timing 

analyses 

cpu  time 
(seconds) 

SIMPLE 

30% 

50% 

31% 

50% 

■H 

93.0 

430.0 

MS 

30% 

50% 

60% 

40% 

52% 

60% 

35% 

272% 

275% 

8 

34 

50 

51.4 

207.5 

301.4 

MT 

30% 

50% 

60% 

38% 

50% 

61% 

29% 

90% 

261% 

32.5 

92.4 

285.0 

CP 

30% 

50% 

60% 

35% 

173% 

355% 

3 

11 

22 

48.3 

192.9 

373.5 

ANNEAL 

30% 

50% 

muting 

9 

75 

79.4 

553.0 

TILOS 

43% 

32% 

6 
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Note  that  although  it  requires  more  cpu  time,  SIMPLE  is  the  most  effective 
in  limiting  transistor  size  increase.  In  general,  cleverer  algorithms  tend  to 
overestimate  the  sizes  of  transistors  not  on  the  eventual  critical  path.  MS  is 
particularly  prone  to  this  problem  because  of  the  difficulty  of  deciding  on  the 
proper  size  for  a  transistor  before  surrounding  transistors  have  had  their  final  size 
determined. 

PTA  is  currently  limited  to  circuits  of  approximately  100  transistors.  As  a 
result,  it  is  difficult  to  say  how  well  various  algorithms  scale  to  larger  circuits. 
The  available  data  makes  it  seem  that  they  are  roughly  quadratic  in  the  size  of 
the  circuit. 


Table  7 

AlRorithm  performance 

vs.  circuit  size 

Circuit  Size 

Algorithm 

Reduction 

8 

24 

48 

96 

SIMPLE 

30% 

8.0 

18.7 

93.0 

451.2 

50% 

11.5 

80.3 

430.0 

* 

MS 

30% 

6.7 

16.4 

51.4 

239.0 

50% 

6.7 

34.4 

207.5 

1095.8 

MT 

30% 

8.2 

8.7 

32.5 

84.9 

50% 

11.9 

29.4 

92.4 

331.2 

CP 

30% 

11.0 

15.3 

48.3 

137.6 

50% 

11.0 

47.2 

192.9 

T 

ANNEAL 

30% 

4.5 

23.3 

79.4 

T 

50% 

16.1 

40.3 

553.0 

T 

•  —  failed  to  satisl 

'■y  request 

?  —  data  not  vet  available 

As  the  circuit  gets  larger,  time  for  symbolic  derivatives  increases 
dramatically.  For  even  the  48-transistor  circuit,  over  50%  of  the  computation 
time  is  spent  taking  derivatives;  for  06  transistors,  the  percentage  increases  to 
above  70%.  A  faster  derivative  procedure  should  make  this  method  more 
competitive  with  the  others. 

The  somewhat  arbitrary  example  of  a  chain  of  four  inverters  driving  a  fairly 
large  output  capacitcance  makes  a  good  test  of  how  the  algorithms  perform  while 
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driving  the  ciucuit  as  close  as  possible  to  its  optimum  size.  Although  this  is  rather 
impractical  —  reducing  the  delay  by  85%  increases  the  circuit’s  size  by  a  factor  of 
15  —  it  is  nonetheless  a  measure  of  the  capabilities  of  the  algorithms. 


Table  8 


Performance  near  optimum  configuration 

Chain  of  4  inverters  —  8  transistors 
(figures  are  cpu  times  to  attain  reductions) 

requested 

delay 

SIMPLE 

i 

MS 

Mgorithr 

MT 

Q 

CP 

ANNEAL 

30% 

40% 

50% 

60% 

70% 

80% 

85% 

90% 

8.0 

8.0 

11.5 

18.5 
38.9 

126.9 

317.6 

* 

6.7 

6.7 

6.7 

6.7 

11.9 

30.3 

55.5 

394.1 

8.2 

8.2 

11.9 

15.4 

26.0 

51.7 

102.7 

* 

11.0 

11.0 

11.0 

11.0 

19.6 

47.8 

114.5 

* 

4.5 

6.4 

16.4 

20.0 

29.5 
116.0 

416.5 

* 

*  —  failed  to  satisfy  request 

The  screening  procedure  in  the  simulated  annealing  algorithm  does  indeed 
boost  efficiency  substantially.  The  savings  increase  with  the  size  of  the  circuit  (as 
timing  analysis  becomes  more  expensive)  and  with  the  percentage  reduction 
requested  (as  more  configurations  become  “obviously  wrong”). 

The  same  cost  function  was  used  both  with  and  without  screening.  One 
method  of  seeing  the  increase  in  efficiency  is  to  calculate  the  “success  rate  — 
how  often  an  evaluation  results  in  an  acceptance. 


Table  9 

Advantages  of  screening 
(averages  of  10  runs) 

Circuit 

Size 

Delay 

Reduction 

Screen? 

Evaluation 

Percentage 

Success 

Percentage 

Cpu 

Time 

24 

30% 

NO 

YES 

100% 

70% 

24 

50% 

NO 

YES 

100% 

57% 

40% 

77% 

74.4 

54.6 

48 

30% 

NO 

’V'ES 

100% 

47% 

37% 

68% 

48 

NO 

YES 

o 
o  o 

15% 

37% 

1100.1 

631.7 

*  —  only  one  run  due  to  cpu  time 
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Finally,  it  is  worth  investigating  how  useful  the  modification  to  the 
interpreter  actually  was.  Table  10  includes  the  ratio  of  the  time  to  evaluate  an 
expression  to  the  time  required  by  Is.  Simply  multiplying  this  ratio  by  the  time 
spent  in  evaluating  equations  using  the  fast  interpreter  will  give  a  fairly  good 
estimate  of  the  time  required  to  evaluate  the  equations  in  the  standard 
interpreter.  Different  algorithms  do  different  amounts  of  evaluation,  so  the  exact 
benefit  they  obtain  differs,  but  it  is  invariably  large. 


Table  10 


Performance  gains  due  to  mcxiified  interpreter 


Alporithn: 

Circuit 

Size 

Modified  Interpreter 
(measured) 

Is  time  Total 

Standard  Interpreter 
(projected) 

Eval  Time  Total 

Ratio 

MS 

8 

11.9 

17.2 

2.6 

24 

95.5 

110.8 

4.5 

48 

919.8 

1061.6 

5.1 

96 

383.7 

5755.5 

5957.1 

10.2 

MT 

8 

2.9 

11.9 

24.6 

33.6 

2.8 

24 

7.4 

29.4 

77.7 

99.7 

3.4 

48 

27.8 

92.4 

389.2 

453.8 

4.9 

96 

138.2 

331.2 

2073.0 

2266.0 

6.8 

CP 

8 

MBM 

3.4 

4.1 

3.7 

24 

53.5 

95.6 

2.0 

48 

K^i 

196.0 

374.0 

1.9 

ANNE.\L 

8 

2.9 

20.6 

24.6 

42.3 

2.1 

24 

5.6 

40.3 

58.8 

93.5 

2.3 

48 

182.5 

553.0 

2555.0 

2925.5 

5.3 

Chapter  8 
Conclusion 


Summary  of  results 

Both  simulated  annealing  and  heuristic  methods  can  reduce  delay  through  a 
circuit  by  509;^  in  a  few  minutes  of  CPI)  time  using  a  simple  delay  model. 
Heuristics  are  tend  to  be  more  efficient  and  produce  smaller  final  circuits;  even 
simple  heuristics  give  surprisingly  good  results.  Although  no  guarantees  can  be 
provided,  several  of  these  approaches  almost  invariably  succeeds  in  satisfying 
reduction  requests  up  to  609c. 

Using  symbolic  equations  is  a  key  to  improved  performance  both  for 
heuristics  and  simulated  annealing.  Using  a  more  accurate  delay  model  might 
cause  the  complexity  of  these  equations  to  increase  dramatically,  and  so  it  is  not 
clear  how^  this  would  affect  the  program’s  speed. 

The  limitations  on  circuit  size  are  largely  a  function  of  the  Prolog 
implementation  in  use,  particularly  its  failure  to  perform  tail-recursion 
optimization.  Other  than  this,  performance  scales  fairly  well  with  size. 

Future  work 

Many  promising  areas  for  research  are  still  almost  untouched.  Several  have 
been  mentioned  in  passing  above;  this  final  discusses  them  in  somewhat  greater 
detail. 

The  most  attractive  possibility  is  taking  advantage  of  the  circuits  hierarchical 
structure.  As  mentioned,  PTA  is  able  to  view  different  levels  of  abstraction  as 
primitive;  this  feature  was  added  primarily  for  the  benefit  of  MOST,  but  no  use 
has  been  made  of  it  so  far.  Instead  of  sizing  the  entire  circuit  simultaneously,  it 
should  be  more  efficient  to  assign  delay  goals  to  cells  and  then  size  the  cells 
recursively. 
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Annealing  and  heuristics  should  certainly  be  combined,  as  should  heuristics  of 
different  types.  For  different  circuits,  different  approaches  are  desirable;  some 
way  of  determining  what  method  is  right  for  a  given  circuit  would  be  extremely 
useful. 

PTA  is  easily  modifiable  to  include  a  more  accurate  delay  model.  The 
Penfield-Rubenstein-Horowitz  distributed  RC  model  is  only  slightly  more  complex 
than  the  lumped  RC  model,  and  current  sizing  techniques  should  continue  to 
perform  much  the  same.  Models  taking  the  waveform's  slope  into  account  cause 
more  difficulty,  but  provide  potentially  large  rewards.  Of  course,  from  the 
standpoint  of  MOST’s  usage  within  ASP,  the  increased  accuracy  will  do  little 
good,  due  to  the  estimates  of  interconnect  length,  but  they  are  important  for  use 
as  a  stand-alone  tool. 

From  an  aesthetic  standpoint,  using  heuristics  is  rather  unsatisfactory. 
Fishburn  and  Dunlop’s  work  [2]  points  the  way  towards  a  sounder  theoretical 
basis,  but  is  currently  restricted  to  the  distributed  RC  model.  This  result  needs  to 
be  extended  towards  a  more  general  model.  Additionally,  decomposition 
techniques  such  as  Matson’s  —  much  like  the  hierarchical  decomposition  described 
above  —  show  great  promise. 
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Appendix 

Source  Code  of  MOST 


%  ttie  iB  the  top-level  file  which  causee  the  others  to  be  loaded  in 

:-([-*UTILS/niacro8*.  X  this  has  to  be  first  so  it  applies  to  the  others 

-•anneal’ , 

-’heuristic’ . 

-•PTVpta’. 

-•PTA/ppp’. 

-•PTA/order’. 

- ’PTA/primitive ’ . 

-’PTA/critpath’ . 

-•PTA/delay’. 

- ’UTILE 'utils ’ , 

- ’UTiLS/symbolic ’ . 

- ’UTILS/structs ’ . 

-•UTILS/print’ . 

UTILS /makeche ’ , 

- ’UTILS/random’ . 

- 'UTILS/minimize ’] ) . 
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ifdef  macros.  In  order  to  have 
these  need  to  be  applied  first. 


precedence  over  the  other  macros, 
and  so  must  be  'asserta'ed. 


:  -  asserta  ( (expand_term 
:  -  asserta ( (expand_term 
Condition,  ! ) ) . 


(ifdef (Condition, Clausel,Clause2) ,Clause2)  : 
(ifdef  (Condition, Clausel,Clause2) ,Clausel)  : 
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%  prolog  code  to  do  simulated  annealing 


measure (Delay) 

N  is  cputime, 
pp(Chs)  < 

make_si2es (Chs.Vars) , 

initialize (Chs , Delay , Vars , Initial)  , 

anneal (Delay , Chs , Vars , F inal , Cost  ^  Actual ) , 


print  (Tsize) , nl 


prin 


measure (_)  : - 

print  ( 'Failure ' ) .nl . 


fail . 


initialize (L.Delay , Vars. Init_delay) 

length  (Vars. Number ) .  .  ^  •  .v  t 

print ('There  are  ’) .print  (Number) .print ( '  transistors  to  size  ) .nl. 

init_configuration  (Vars. Init) . 

cost (L. Delay. Vars. Init. Init_cost. Init_delay. Init_eq. Init_other)  . 

init_stopinfo  (Init_cost . Stopinfo)  . 

init_temperature  (Delay . Init_delay . T)  . 

note_values (Init . Init_cost. Init_delay. Init_eq. Init_other)  . 
fail . 

initialize  (L.  Delay  .Vars  .  Init_del  ay) 

$current_delay (Init_delay)  . 
j 


anneal (Delay . Chs . Vars . X . Xc . Xd) 

repeat . 

get_temperature (T) . 
get_stopinfo  (Stopinfo)  , 
iterations_at_ten5D  (T.N)  . 

inner _ loop  (N .  T.  Delay .  Chs  .Vars .  X,  Xc .  Xd)  . 

update_stopinfo  (Stopinfo, Xc.New_si)  , 

( (stop  (Delay, Xd.New_si)  ;  %  success 

%  ot 

give_up(New_si)) ;  %  failure 


%  or 


update_tei:gDerature  (T,New_t)  .  % 

fail)  .  % 


keep  going 
the  fail  returns 


to 


the  repea 


%  inner  loop  goes  through  N  iterations  at  the  specified  tempera -ure 
inner_loop  (N.  T . Delay , Chs . Vars , J , Cost . Actual) 
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range (1, I ,N) , 

get_values  (X.Xcost,Xdelay.Xdeq,Xoeq) , 
generate  (X) , 

screen (X , Xdeq, Xoeq, Xcost .Delay , T,R) . 

make_real_configuration(X.J) . 

cost  (Chr.  .Delay.  Var s .  J .  Cost .  Actual . Deq .  Oeq)  . 

accept  (Xcost . Cost . T.R)  , 

i'eplace_values  ( J .  Cost .  Actua  1 . Deq,  Oeq)  . 

Actual  <  Delay.  %  succeed  only  if^done  ^ 

%  otherwise,  fail  and  the  retry  goes  back  to  range 


inner_loop  X.  Xcost . Xdelay) 

get  values  (X . Xcost . Xdelay . Xdeq. Xoeq) 


%  the  screening  function  --  throw  out  "obviously  wrong"  conf icurations 
screen (Con fig . Deq. Oeq, 01 d_cost .Delay . T.R) 

size_cost (Conf ig. Sizo_cost) . 

Diff  is  Deq  -  Delay. 

inax(0,Diff  .Delay_cost)  . 

map  (other_penalty. Delay, Oeq. Penalties) . 

sum (Penalties , Other_cost) . 

make_cost (Delay _cost.O.Size_cost.Test_cost) . 

random  (R)  . 

I  ^ 

accept (01d_cost.Test_cost.T.R) . 

I 


other _penalty (Delay. Eq. Penalty) 
Actual  is  Eq  -  Delay, 
max (O, Actual. Penalty) . 

accept (Xcost , Jcost . T, R) 

Del_c  is  Jcost  -  Xcost, 
f  (Del_c.T,Y)  , 

R  <  Y. 


f(Del_c._.l) 

Del_c  <  0. 


f(Del_c.T.Y) 

Y  is  exp (-Del_c/T) , 

y  annealing  utility  functions,  including  intializing  and  updating 
'i  pa?Lete?s.  a.ost  of  these  are  very  sketchy,  and  lots  of  useful 
%  work  could  no  doubt  be  done  here 


init_temperature  (Delay , Init_delay ,  T) 
T  is  (Init_delay  -  Delay) /4. 
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asserta  ($current_teii5>  (T) )  , 


init_stopinfc(Init,  [Init.0.1.2])  %  as  long  as  they're  different,  it’s  cod 

asserta  ($current_info ( [Init.0, 1,2]))  , 


%  currently,  iterations_at_teiiip  doesn’t  depend  on  the  temperature.  clearly 
%  it  should  for  better  performance,  sorry. 
iterations_at_tei:p  (Temp,  25)  !. 

%  this  should  also  be  somewhat  more  conplex 
update_temDerature (T , Newt)  : - 
Newt  is  0.8*T, 

asserta  ($current_temp (Newt) )  , 


get_teiiperctu:r'e  (T) 

retract  (2r.urrent_temp  (T) )  , 


update_stopinf o  (  [XI ,  X2  ,  X3,  _]  , X,  [X,X1,X2 ,X3]  )  .  - 

asserta  ($current_info  (  [X,X1,X2,X3] ) ) , 


get_stopinfo (Stopinfo) 

retract  ($current_info (Stopinfo) )  , 


stop (Delay, Xdelay, Stopinfo) 

Xdelay  <  Delay. 

%  give  up  if  no  change  (in  cost)  in  three  iterations 
give_up ( [X , X ,  X ,  X]  )  . 

^  Pry  po  make  the  asserts  and  retracts  as  transparent  as  poss-ole 

note_values (X,Xc,Xd,Xdeq,Xoeq) 

asserta ($current_conf ig (X,Xdeq,Xoeq) )  , 
asserta  ($current_cost (Xc) )  , 
asserta  ($current_delay (Xd) )  , 


get_values (X,Xc,Xd,Xdeq,Xoeq) 

$curr€nt_config (X,Xdeq,Xoeq)  , 
$current_cost (Xc) , 
$current_delay (Xd) , 

j 

replace_values  (X,Xc,Xd,Xdeq,Xoeq) 

retract  ($current_conf ig (_,_,_) )  - 
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retract  ($current_cost  (_) )  , 
retract  ($current_delay  (_) )  , 
note_values  (X,Xc,Xd,Xdeq.Xoeq)  , 


%  interface  tine  simulated  annealing  algorithm  with  the  timing  analyzyer 


cost  takes  the  variables  as  its  third  argument,  and  the  actual  c^fi^ration 
afltrfoSrSi.  This  binds  them,  and  thus  sets  the  sizes  in  the  Chs  itself 


cost  (Chs.Delav,  Contig. Conlig, Cost. Actual_delay,Delay_eq, Other _eq) 

proces3_chs  (  []  ,Chs)  , 

find  criticalcath (Chs,Cp)  , 

inax_ce  1  ay_cos t  (Chs ,  Cp , De  1  ay ,  Actua l_de  1  ay ,  De  1  ay_cos t ,  De  1  ay_eq)  , 
other_de  1  ay_cost  (Chs ,  Cp , De  1  ay ,  Other_cost ,  Other_eq)  , 

size_cost  (Config, Size_cost) , 

make_cos t  (Del ay_cost , Other_cost , Size_cost , Cost)  , 


max.de 1 ay_cos t  (Chs ,  Cp ,  De  1 ay , Actua 1 _de 1 ay , Cos t , Eq) 
delay_equation  (Cp,  Eq)  , 
delay  (Cp,Actual_delay)  , 

Diff  is  Actual_delay  -  Delay, 
max  (Diff,0, Cost)  . 

other  delay.ccst  (Chs, Cp, Delay, Total_over,Eq) 

all_d€lays_over  (Chs, Delay, Total_over,Eq)  . 


size_cost  (Conf  ig.  Size)  : 

total_size  (Conf ig, Size)  . 

all_delays_over  (Chs ,  Delay ,  Total ,  Eqs) 
signals  (Chs, Signals)  , 

map  (delay _equation, Signals, Eqs)  , 
map  (penalty, Delay, Signals, Penalties) , 
sum (Penal ties, Total) . 

penalty (Delay, Entry, Penalty) 

delay  (Entry,  This_delay)  , 

Diff  is  This_delay  -  Delay, 
max  (Diff,0, Penalty)  . 

total_size  (Ccr.fig,  Size) 

map  (cat  e_size,  Conf  ig.  Combined)  , 
sum (Combined, Size)  . 

%  generate  modifies  the  old  configuration,  which  is  in  the  form  Size*Change 
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%  by  binding  Change  to  some  number 

generate  (Old) 

miniE3um_gate_size  (Min)  , 

apply_to_each  (perturb. Min. Old)  . 


perturb  ( _ Size+Change) 

number  (Change) . 


perturb (Min. Size+Change) 

var (Change) , 
perturbation (Change) . 
Size  +  C.iange  >=  Min, 


perturb (Min. £ize+0) 
perturb (_,  Size)  ; - 

nuiioer  (Size)  . 


perturbation_size  (2) 


%  maximum  perturbation 


Mmc  is  -Max_change, 

Maxc  is  Max_change+1. 
random_int (Mmc, Maxc, Change) , 


X  the  initial  configuration  is  sitply  with  all  gates  at  the  ^ni.u.  site 
^"^"-=°'’“?ti"h!gitr_slz:!^lrs:init). 

init_gate_size  (X. Y+_) 

var  (X)  « 

minimum_gate_size  (I)  ■ 

%  unless,  of  course,  they  happen  to  have  a  size  already  fixed 

init_gate_size (X.X)  : - 

number  (X)  . 

z.al<e_cost  (Delay_cost ,  Other  _cost ,  Size_cost ,  Cost)  :  - 
max_weight (Kl)  . 
all_weight (K2)  , 
size _ weight  (K3)  . 
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Cost  is  K1  *  Delay_cost  +  K2  *  Other _cost  +  K3  *  Size_cost. 

inax_weight  (5)  , 
all_weight  (5)  . 
size_weight  (1)  . 


inake_real_conf iguration  (Ccnfig, Trial) 
map ( free , Con fig, Tr ial )  . 

%  free  changes  from  the  form  Size+Change  (with  both  bound)  to  the  form 
%  Newsize+_,  where  Newsize  =  Size+Change 
free  (S,Result-*^_) 

gate_size  (S, Resul  t)  . 
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%  try  a  heuristic 
try _heuristic (Delay) 
c 1 ear _g lobs, 

N  is  cputime, 

PP  (L)  . 

iiiake_sizes  (L, Vars)  , 
init_con figuration (Vars,Init) , 
asserta  ($current_conf iguration (Init) ) , 
heuristic_iterate (L, Vars, Delay) , 

printC  final  configuration  is  ’ )  ,print  (Vars)  ,nl , 

total_size (Vars, Size) .print  (' final  size  is  ) .print  (Size) ,nl, 

print  (’time  required  is  •),Time  is  cputime  -  N, print  (Time) ,nl 

heuristic_iterate  (Chs .Vars , Delay)  i" 
repeat. 

current_configuration (Config) , 

cost (Chs, Delay, Vars, Con fig.  _.  Actual. Delay_eq,_) , 

(Actual  Delay  ->  print  ( 'whee !’). nl ; 

make_next_config (Con fig, Delay .Del ay_eq)  , 

fail)  . 


iBake_next_cor. f ig  (Config,  Cost .  Eq)  :  ~ 
collect_vars  (Eq.  []  .Vars)  , 
partials  (Eq.Vars.Derivs) , 

map (zero, Vars, Constraints) .  %  oversimplification, 

^  %  already  have  values, 

%  they  could  decrease 

minimize  (Eq, Cost. Vars. Constraints, Derivs,New_cost) , 
make_real_configuration (Config, New)  , 
retract  ($current_con figuration (_) ) , 
asserta  ($current_conf iguration (New) ) , 


they  might 
in  which  case 


current_con figuration (X) 

$current_configuration (X) . 

I 

%  MS  heuristic  --  increase  size  of  transistor  by  more  than  1 

clear_globs 

abolish  (best.  2)  . 
abolish (this. 1)  . 
abolish  (bun?),  1)  . 

choose_b®st (Con  f i g , Cost . Eq . Actual ) 
asserta  (best (0, Cost) ) . 

try_each  (1 . Config, Eq, Cost) , 
retract  (best ( _ Actual))  . 

try_each  (_,  []  .  _.  _)  • 
ifdef ($fast_interpreter, ( 
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%  hacked  interpreter 

try_each (N, [1  1  Config] , Eq, Cost) 

This_one  is  Eq« 

replace_if_necessary  (N.This_one.Cost)  , 

fail)  , 

%  normal  interpreter 
(try_each(N,  [1  1  Config]  ,Eq,Cost) 
evaluate  (Eq,This_one) , 

replace_if_necessary  (N.This_one, Cost)  , 

fail) )  . 

■try_each(N,  [Kod  1  Config3  * Eq, Cost) 

N1  is  N+1, 

ti  y_each  (N1 ,  Conf  ig,  Eq,  Cost)  , 

choose_size (Mod, Eq, Best, N) . 


choose_size (Mod, Eq,  P^st,N) 

is_l:^st  (N)  , 

I 

choose _ individ  J.al_size  ^*5od,  Eq,  Best)  . 


choose_size  (O,  _,  _,  _)  • 

replace _ i f _ n®c®ssary (N , Current , Cost) 

best (_,Best)  , 

Current  <  Best, 
retract  (best  (_,  _) )  , 
asserta (best (N, Current) )  , 


isjoest  (N)  :  - 

best  (Is ,  _)  , 

! 

choose_individual_s ize (Mod , Eq, Best)  ; - 

best ( _ One),  ,  ~  •  \ 

■try_sizes  (Mod,  2 , Eq, One , Best,  1 , Choice) 
asserta (best_size (Best, Choice) )  , 
fail. 

choose_individual_size (Mod, Eq, Best)  : - 
retract  (best_size (Best, Mod) ) , 

; 

try_sizes  (Mod , Mod ,  Eq,  Best ,  Best , Choice , Choice) 
max_change  (K)  , 

Mod  >  K, 

t 


ifdef ($fast_interpreter, ( 

try_s izes (Mod , Mod , Eq , Best , Best , Choice , Choice) 
This  is  Eq, 
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asserta (this (This) ) , 

This  >  Best, 

%  usual  interpreter  . 

(try_s izes  (Kod ,  Mod ,  Eq ,  Best ,  Best .  Choice ,  Choice) 
evaluate (Eq, This)  , 
asserta (this (This) )  , 

This  >  Best, 

!))  - 

•try_sizes  (Me d ,  Current ,  Eq,  _,  Best , _,  Choice) 
retract  (this  (Bsf) )  . 

Next  is  2 ‘Current, 

try_si2es  (Mod,Next, Eq, Bsf , Best, Current , Choice)  . 
max_change (6 , 
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process_chs  (Elnv ,  Chs) 

isj3rocessed(Chs) .  %  don't  want  to  reprocess 


process_chs (Env, Chs) 

is_priTriitive  (Chs)  , 
process_priinitive  (Chs,Env)  , 


process_chs (Env , Chs)  : - 

subcells  (Chs. cells (Subcells) )  . 

apply_to_each (process_chs.  [ChsjEnv]  .Subcells)  . 
all_r^-:c_delays  (Chs .  Subrells)  . 


in  which  the  signal  is 


is_priinitive  (Chs)  is_net(Chs). 

%  processing  only  one  signal  in  a  chs  means  that  we  don't  need  to 
^  process  *ali*  the  subcells,  just  the  ones 

%  an  output .  .  ,  V 

process_sicmal_in_chs  (Env . Sig. Chs . Signal_entry)  . - 
subcells (Chs. cells  (Subcells) ) . 
relevant_cel Is (Subcells . Sig. Relevant) . 
apply_to_each  (process_chs.  [Chs  |  Env]  .Relevant)  . 
f ind_s  ignal_entry  (Sig.  Chs .  Signal_entry)  . 

max_c3lay  (Relevant. Signal_entry)  . 


get_signal_entry (Sig. Chs . Entry)  : - 
signals (Chs. Signals) . 
assoc (Sig. Signals. Entry)  . 


relevant_ceils  (  []  . —  []  )  • 
relevant_cells([ClCs] .Sig. [C|Newcs]) 
is_output (Sig. C) . 

!  ^ 

relevant_cells (Cs. Sig.Newcs)  . 
relevant_cel Is (  [C ] Cs] . Sig. Newcs) 

relevant_cel Is  (Cs . Sig.Newcs) . 

%  the  subcells  are  done,  and  so  each  output  will  have  a  delay  for  several 
%  of  the  subcells.  find  the  max. 

all_jnax_delays  (Chs.  Subcells) 
signals (Chs . Signals)  . 

apply _ to _ each  (one_max _ ^delay .  Subcell s .  Signals)  . 

%  for  a  particular  signal,  collect  'em  all 
one_max_delay (Subcells, Signal_entry) 

max_delay  (Subcells, Signal_entry)  . 
inax_delay  (Subcel  Is ,  Signal_entry) 
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sigT'.al_nanie  (Signal_entry ,  Sig)  , 
f inc_niax_delay  (Subcells ,  Sig,  Sigrial_entry)  . 

set_signal_dslay  (sig  (_,D,_)  ,D)  . 


niake_dunimy_signal_entry  (sig  (_,0,_)  )  !. 

find_inax_delay  (  []  Dummy)  :  -  make_dummy_signal_entry  (Dummy)  . 

find  max_delay(  [Cell  ICells]  , Sig, Entry) 

find_si^al_entry  (Sig, Cell, EO)  ,  %  this  will  fail  if  Sig  isn  t  in  Cel 
!  ^ 

f  ind  7'^x_delay  'Cells,  Sig, El)  , 
bigoer_oelay (EO, El , Entry)  . 

find_max_delay  ([Cell  I  Celli:]  ,  Sig,  Entry) 

find  .r.ax_delay  (Cells,  Sig,  Entry)  . 


bigger_delay  (D1 ,  D2 ,  Dl) 

delay  (Dl , Delay  1)  , 
delay  (D2,Delay2)  , 

Delayl  >  Delay2, 

bigger_delay  (Dl ,D2 ,D2)  . 

delay (Signal, Delay)  arg  (2 , Signal , Delay)  . 

delay _in_cell  (Chs , Sig, Delay , Info) 

make_sicnal_entry  (Sig,E>elay,  Info,  Entry)  , 
find_GigTial_entry  (Sig,  Chs ,  Entry)  . 

make_signal_enrry  (Sig,  Delay,  Info,  sig  (Sig, Delay,  Info))  . 

process_priE-itive  (Chs ,  Env) 

set_input_delays  (Chs,Env)  , 
attach  _orders (Chs) , 
output_signals (Chs,0) , 

appiy_tc_each (primitive_delay , [ChsjEnv] ,0) . 
set_input_de lays  (Chs , Env) 

inputs (Chs, I)  , 

app  ly_to_each  (check_input ,  Env ,  I )  . 

check_input ( _ I) 

knovr,_input_delay  (I)  , 


check_input (Env , I )  : - 

signal_name  (I , Name)  , 

delay_in_env (Env, Name, Delay _entry) , 
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set_input_delay (I,Delay_entry) . 

known_input_delay  (in  (_.  E>elay ,  _) )  :  - 

number  (Delay)  . 

set_input_delay  (in (_, Delay , Info) , sig  (_, Delay, Info) )  . 

%  delay  in  env  --  make  sut e  a  given  signal  has  a  known  delay 
delay_in_env (  [Chs  jEnv] , Sig, Delay)  : - 

known_delay (Chs,Sig,Delay) ,  %  it  does  already 


dcl?y_iM_f  i-v  (  [ChslEnv]  ,Sig,Lo''?y) 

is_input (Sig, Chs, Delay) ,  %  it  doesn't  but  it's  outside  our  current  chs 
delay_in_env (Env, Sig. Delay) . 

delay_in_cnv  ( [Chs  |  Env]  ,  Sig,  D;?lay)  :  - 

process_signal_in__chs  (Env , Sig, Chs , Delay)  . 

known_del ay (Chs , Signal , Entry) 

f ind_signal_entry (Signal , Chs, Entry) , 
know:-! _signal_delay  (Entry)  . 

known_signal_deiay  (sig  (_, Delay, _) )  :  - 

\+  va.'  (Dnlay)  . 

f ind_signal_entry (Name , Chs , Entry)  : - 
signals (Chs , Signals) , 
assoc  (N?:ne,  Signals,  Entry)  . 


is_processed  (Chs)  ;  - 

signals  (Chs , Signals) , 

apply_to_each (known _signal_delay, Signals)  . 
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%  path  pre-processing 


%  inake_signals  collects  all  the  signal  names  and  puts  in  slots  in  the  outputs 
%  for  the  delays 

inake_signals  (Chs ,  [0]  ) 
is_net (Chs) , 

» 

•  t 

inputs  (Gis ,  I)  , 

add_delays_to_inputs(I,NewI)  , 

outp  ;t_signals(Chs,  [0])  .  %  only  one  output  for  a  net 

adu_d'"  l3ys_to_outputs  ( [0]  .  [NewO]  )  ^ 
signals  (Chs,  [NsuO|NewI])  . 

make_signal:>  (Chs , O) 

subcolls (Chs, cells (Cs) )  , 

map  (r;?ke._signals,Cs,Subcell_outputs)  , 
flatten  (Subcel l_outputs , Tenp) , 
reECve_dupes  (Temp,  []  ,0)  , 
inputs (Chs , I)  , 

add_delays_to_inputs (I ,NewI) , 
add_delays_to_outputs  (0,New0) , 
append (NevI , NewO, Sigs) , 
signals (Chs , Sigs) . 

make_outputs (Env, Chs)  : - 

outputs (Chs , Outputs) , 

^pp3.y _ ^to _ each  (symbolic _ ^terminal_capacitance ,  Env , Outputs)  , 

(is_primitive (Chs)  ->  true; 

subcells  (Chs , cells  (Sub) ) , 

apply_to_each (make_outputs,  [Chs | Env] , Sub) )  . 


add_delays_to_inputs  ([],[])  .  v  ,  .  r  •  t 

add_delays_t:o_inputs  ( [in  (Name, Delay ,  Info)  |Is],  [sig  (Name, Delay, Info)  |Xs]) 
add_delays_to_inputs (Is,Xs) . 

add_delays_to_inputs (  [in  (Name)  jls],  [sig (Name, - )  |Xs]) 

add_delays_to_inputs  (Is,Xs)  . 


add_delays_to_outputs  ([],[])• 

add_delays_to_outputs  (  [0|0s]  ,  [sig(0 - )  jXs]  ) 

add_delays_to_outputs (Os,Xs) . 


make_structu.re  (chs  (Name ,  Inputs ,  Outputs ,  Subcel  Is)  ,C) 

map  (make_structure ,  Subcells , Newsubs)  , 

C  =..  [chs, Name,  Input s,Ouputs, Newsubs - , 

make_signals  (C, _)  , 
paths_in_env  (C,_)  . 


Dec  9  21:29  1985  PTA/ppp  Page  2 


niake_paths  (C,  [P]  )  :  - 

is_net  (C)  , 

J 

subcells (C,Net)  , 
source (C, S)  , 
drain (C,D) , 

P  =..  [path, S,D,  [  [Net]  ]] . 
paths  (C,  [P]  )  . 

inake_paths  (C,P)  :  - 

subcells (C, cells  (Subcells) ) , 
length (Subcells, L) , 

map  '-?,kr_paths.  Subcells,  S'jbpaths)  , 
flatten  (Subpaths, PI) , 
make_al l_paths (L,P1 .P2) , 
input_signals (C, I)  , 
select. .input  (P2 , 1 ,  P3)  , 
paths  (C,P3)  , 
output. .signals  (C,0)  , 
select_nutput (P3,0,P) . 

select_input  ([],_/  []  )  • 
select_input  (  [P  IPs]  ,  I ,  [P  |Rest]  )  :  - 

P  =.  .  [path , X - ]  , 

member  (X ,  I )  , 

I 

•  / 

select_input  (Ps, I ,Rest)  . 
select_input  (  [P [Ps] , I ,Rest) 

select_input (Ps , I ,Rest) . 

select_output  ([]>_.[])• 
select_output  ( [P  |Ps]  ,0,  [P  jRest]  )  :  - 

P  =.  .  [path  —  Y,_]  , 
member  (Y,0)  , 

I 

•  / 

select_output (Ps,0,Rest) . 
select_output  (  [P |Ps] ,0,Rest) 

select_output (Ps,0,Rest) . 


%  make_all_paths (Length  , Short  paths,  all  paths) 

make_all_paths  (0, Paths, Paths)  !. 

make_all_paths  (N,  Short_paths, Paths) 

cross  (Short_paths, Short_paths, Somewhat_longer_paths) , 

N1  is  N  //  2, 

make_all_paths (Nl, Somewhat_longer_paths. Paths) . 

cross  ( []  ,  Short ,  Short)  . 
cross  (  [Pe jPes] , List ,X) 
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dot  fPe, List ,P11)  , 
cross  (Pes,List,P12) , 

coitbine_path_lists  (P11,P12.X)  . 
coinbine_path_lists  (  []  ,P,P)  . 

conibine_patl'._l ists  (  [path  (X,  Y,P)  IPs]  ,Plist,Newp) 
add_new_path(X,Y«P,P list, Temp)  , 
combine  .path_l ists  (Ps ,  Ten5),Newp)  . 


dot  ( _ [].[])-  ,  . 

dot  (path  (X,Y, Paths)  ,  [path  (Y,Z,P2)  |  Pes]  ,  Result) 

diddle  (Paths, P2, Longer) , 
d^L^L=-;;h(X,Y,Path.:)  ,Pr?,More)  . 
add_new.43ath  (X,  Z,  Longer  , More, Result)  . 

dot(path  (X,Y,  Paths)  ,  [path  tZ - )  I  Pes]  ,  Result) 

Y  \=  Z, 

dot (pDth  (X,Y, Paths) , Pes, Result)  . 


%  add_new_path  (f ,  T, Path, Pathl.i  st, Result) 
add_new_path  (F,  T,P,  []  ,  [path  (F,T,P)  ]  )  . 
add_new_patr.  (F ,  T, P ,  [path(F,T,P)  |X]  ,  [path  (F,  T,P)  |X]) 
add_new_path (F , T, P ,  [path (F , T,P1)  |X] ,  [path (F,T,Newp)  jX]) 
put_paths_together  (P,Pl,Newp) , 

I 

add_new_pa  cr.  (F ,  T, P ,  [X  |  Xs]  ,  [X  |  Yj  )  :  - 

add_new_path (F , T , P , Xs , Y) . 


put_4Daths_tcgether  (  []  ,P,P)  . 
put_paths_tcgether  (  [X  |  Xs]  ,  P ,  Newp)  :  - 
member  (X,  P)  , 

I 

put_paths_togethGr  (Xs,P,Newp)  . 
put_paths_together ( [X | Xs] , P , Newp)  : - 

put_paths_together  (Xs ,  [XjP]  ,Newp)  . 


diddle  ([],_.[])■ 

diddle ([P  IPs]  ,Plist, Result) 

add_to_each (P,Plist,Rl) , 
diddie(Ps,Plist,R2) , 
append  (R1 ,R2 , Result)  . 

add_to_each  (?,[]/[])• 
add_to_each (P ,  [L |Ls] ,  [R IRs] ) 
append  (P,L,R) , 
add_to_each (P,Ls,Rs) . 
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y  attach_orders  --  once  all  the  inputs  have  known  delays,  the 
%  gates  can  be  ordered  nicely.  I  do  this  by  sorting  the  inputs, 

%  and  then  attaching  the  correct  positions  to  each  gate  */ 

attach_orders (Chs)  : - 
inputs (Chs , I)  , 
sort_inputs  (I , Sorted) , 
gates  (a'.£,Glist)  , 

attach_to_glist  (Sorted, Glist, _ )  . 

%  since  I  didn't  feel  like  writing  a  sort  routine,  I  just  massaged  things 
%  so  that  I  could  use  keysort,  the  prepackagged  routine. 
sort_inputs (I , Sorted) 

inake_cor  Lable_inputs  (I ,  Sort)  , 
keysort  (Sort , Ugly) , 
beauti  fy  (Ugly ,  Sorted,  1)  . 

%  keysort  wa.nts  its  inputs  in  the  form  "Key-Value" 
make_sortable_ir.U'Uts  (  []  ,  []  )  . 

make_sortable_irputs  ( [In  I  Ins]  ,  [Key- In  | Keys]  )  :- 

input _delay  (In ,  Key)  , 
inake_sortable_inputs  (Ins, Keys)  . 

input_delay  (in  (_,D,  _)  ,D)  . 

beautify  ([]-[]._)• 

beautify ( [_-In | Ins] ,  [  (Sig,Pos)  |Rest] ,Pos)  : - 
sigr-al_name  (In ,  Sig)  , 

PI  is  Pcs  +  1, 
beautify  (Ins, Rest, PI) . 


attach_to_glist  (Sorted, G, Order) 
is_CBte  (G)  , 
sigr.al_naine  (G,Sig)  , 
lookup_order  (Sig,  Sorted, Order)  . 

attach_to_gl ist  (Sorted , ser ies (Gs) , Order )  : - 

map  (order_gate.  Sorted, Gs, Orders)  , 
max  (Orders, Order)  . 

attach_to_glist  (Sorted, parallel  (Gs)  , Order )  :  - 

map  (attach_to_glist.  Sorted, Gs, Orders)  , 
max (Orders, Order) . 

^  -tbis  puts  the  order  in  the  right  field,  as  well  as  returning  it 

order _gate  (Sorted,  (G, Order)  , Order)  :- 

attach_to_glist  (Sorted, G, Order)  . 

lookup _order  (Sig ,  List , Order) 

asscc (Sig, List, (Sig, Order))  . 
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order_list (Gs,Newgs) 

inake_sortable_gat;es  (Gs,L)  , 
keysort (L, Newl)  , 

niake_sortable_gates  (Newgs,Newl)  . 

inake_sortable_gates  ([]/[])• 
inake_sortable_gates  (  [G I Gs]  ,  [Key-G [Keys]  ) 
gate_order (G,Key) , 
niake_sortable_gates  (Gs , Keys)  . 

gate_order ( (_, 0) , 0) . 

/*  try  each_gate  will  go  through  a  series  gate  to  see  where  ^e  source 
of  the  maximum  delay  is.  By  now,  the  proper  orders  have  been 
put  on  the  gates,  and  they’re  sorted,  so  I  just  try  each  one  in 

turn  as  the  trigger.  / 

/*  syntax : 

try_e?=>ch_gate  (Chs,Newgs,Rin,  []  ,Rlist, 

Cout,Clist, 

_,Rnet_sym, 

_,Cnet_sym, 

0,D, 

-.Trig)  . 

V 

try_each_gate  (_, 

Rsf ,Rsf , 

Csf , Csf , 

Dsf ,Dsf , 

Tsf,Tsf)  !. 
ifdef ($fast_interpreter, ( 

%  hacked  interpreter  „  ^  r, 

try_each_gate  (Chs,  [  (Gate,...)  [Gates]  ,Rin,Rprev,Rg+R_int+R_rest, 

Gout , Cg+C_int+C_rest , 

Rsf ,R_sym, 

Csf ,C_sym, 

Dsf,D, 

Tsf,T) 

Rtot  is  Rin  -  Rg, 

Ctot  is  Cout  -•  Cg, 

%  recursive  call  takes  care  of  nested  structures 
net_delay  (Chs , Gate , Rtot ,Rg_sym,  Ctot ,  Cg_sym,Delay ,  Trig)  . 

Delay  >  Dsf, 

Cout_rest  is  Ctot  -  C_int, 

I 

try_each_gate (Chs, Gates, Rin, Rprev  +  Rg  +  R_int,  R_rest. 

Cout_rest , C_rest , 

Rprev  +  Rg_sym  +  R_int  +  R_rest,  R_syTL, 

Cg_sym  +  C_int  +  C_rest,  C_sym, 
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Delay, D, 

Trig,!)) , 

Y  normal  interpter  .  _ 

(try_each_c3te(Chs,  [(Gate,_)  [Gates]  ,Rin,Rprev,Rg+R_int+R_rest, 

Cout , Cg+C_int+C_reEt , 

Rsf ,R_sym, 

Csf ,C_sym, 

Dsf,D, 

Tsf,T) 

evaluate  (Rg,  R_correction)  , 
evaluate (Cg,  C_correction) , 

Rtct  is  Rin  -  R_correction, 

Ctot  is  Cout  -  C_c-.orrection, 

y  recursive  call  takes  care  of  nested  structures 
ne  ■:_delay  (Chs ,  Gate  ,Rtot  ,Rg_sym,  Ctot ,  Cg_sym, Delay ,  Trig)  , 
Delay  >  Dsf, 

Cout_rest  is  Ctot  -  C_int, 

!  ^ 

try _each_gate  (Chs.  Cates, Rin, Rprev  +  Rg  +  R_int,  R_rest, 

Cout_rest , C_rest , 

Rprev  +  Rg_sym  +  R_int  +  R_rest,  R_syin, 
Cg_sym  +  C_int  +  C_rest,  C_sym, 

Delay, D, 

Trig,!))) . 

try  each_gate(Chs,  [(Gate,_)  [Gates]  , Rin, Rprev, Rg+R_int+R_rest, 

Cout , Cg+C_int+C_rest , 

Rsf  ,R_sym, 

Csf ,C_sym, 

Dsf,D, 

Tsf,T) 

try _eacK_gate  (Chs, Gates, Rin, Rprev  +  Rg  +  R_int ,R_rest, 

Cout,C_rest, 

Rsf ,R_sym, 

Csf ,C_sym, 

Dsf,D, 

Tsf,T)  . 


/*  syntax : 

do_par_gates  (Chs , Gs , Rin, Cout, 

Rsf  ,Rnet_sym, 

Csf  ,Cnet_sym, 

Dsf,D, 

Tsf,Trig)  . 

do  oar  aates(_,  [] _ Rsf , Rsf , Ctot, Ctot, Dsf , Dsf , Trig, Trig)  . 

do_p»ar_gates  (Chs,  [G[Gs]  , Rin, Cout , Rsf  ,Rtot,  Csf , Cto^psf,D, Tsf,T) 

net _ ^del ay  (Chs ,  G, Rin ,Rnew ,  Cout,  Cnew,  Delay ,  Trig)  , 

Delay  >  Dsf, 
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do_par_gates (Chs , Gs ,Rin . Cout .Knew ,Rtot , Cnew , Ctot . Delay , D, Trig. T) . 
do_par_gates  (Chs.  [G|Gs3 .Rin.Cout.Rsf .Rtot.Csf .Ctot.Dsf  .D.Tsf  .T) 

do_par_ga'tes  (Chs . Gs .Rin ,  Cout .Rsf  ,Rtot .  Cs f .  Ctot . Ds f  .D.  Ts f .  )  . 
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%  individual  net  delay  in  a  hierarchical  environment. 
primitive_delay  ( [Chs  1  Env]  ,  Sig)  ,  ,,  ^ 

is_net (^hs) ,  %  only  case  we  handle  so  far 

subcel Is (Chs , t (S , G.D) ) , 
paths_from_input  (Env.S, Paths) , 

terEiinal_capacitance  (Chs,D,Cout_sym/Cout_num)  , 

max_de  1  ay _in_net  (Chs. Paths, G,Cout_nuiD,0,  []  .Delay, Trig)  . 

inake_in f o_rec  (Tr ig ,  Cout_syin,  Chs .  Info)  , 
delay _in_cell (Chs, Sig, Delay. Info) . 

inake_info_rec(info(T,R.C)  .  Cout_syin,  Chs,  info  (T.R,  C+Cout_sym,  Chs) )  . 

max_delay_in_net  (_,  Delay ,  Info, Delay ,  Info)  . 

max_delay_in_jnet  (Chs ,  [P  |Ps]  .G.Ct.Dsf,  Isf, Delay,  Info) 
resistance (P,Rin_sym,Rin_num)  , 

ne  t_de  1  ay  (Chs .  G ,  Rin_nuin ,  Rnet_sym ,  Ct ,  C_sym ,  D ,  Tr  i g)  , 

D  >  Dsf, 

I 

l'=  [info, Trig, Rin_sym+Rne t_sym , C_sym] , 

max_delay_in_net  (Chs, Ps.G.Ct.D.I. Delay, Info)  . 

max  delay_in_net (Chs,  [P IPs], G.Ct.Dsf, Isf. Delay, Info) 

max_delay_in_net(Chs,Ps,G,Ct,Dsf,  Isf, Delay,  Info)  . 

glist_resistance (G,R) 

combine  (gate_resistance,G,R)  . 


net  delay  (Cns,G,Rin,Rnet_sym,Cout.Cnet_sym,D,G)  •  •  -o-r, 

“  is  cate(G),  %  note  that  the  gate  resistance  is  in  Rin 

A/  /  ^  VS  \ 


gate_capacitance  (G,  Cnet_sym,  Cnet_num)  , 
gate_resistance  (G,Rnet_syin,Rnet_num)  , 
sigr,al_name(G,Sig)  , 

find_signal_delay  (Chs,Sig,Trigger_delay)  , 

£)  is  Trigger_delay  +  (Rin  +  Rnet_num)  *  (Cout  Cnet_nuin)  . 


net_delay  (Chs  ,  series  (Gs)  ,Rin,Rnet_sym, Cout.  Cnet_sym,D,  Trig)  . 
order _list (Gs.Newgs)  , 

combine (gate_resistance, series (Newgs) .Rlist ,R_num) , 

^  sitrplify  (Rlist ,  Sinp_r list)  , 

Rtot  is  Rin  +  R_num, 

combine (gate_capacitance, series (Newgs) ,Clist,C_num) , 
%  sinplify  (Clist,Sinp_clist)  , 

Ctot  is  Cout  +  C_num, 

•try_each_gate  (Chs ,  Newgs  ,Rtot , 0,R1  ist , 

Ctot.Clist, 

_,Rnet_sym, 

_ Cnet_sym, 
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0,D, 

-.Trig)  . 


net_delay(a:s,  parallel  (Gs)  ,  R  in.  Rnet_sym,  Gout ,  Cnet_syiii,D,  Trig) 
do_par_gates  (Chs ,  Gs  ,Rin ,  Gout, 

_ Rnet_sym, 

_.Gnet_syni. 

0,D. 

-.Trig)  . 


find_signal_delay  (Ghs, Sig.Delay) 

f ind_signal_entry (Sig, Ghs . Entry) . 
delay  (Entry. Delay)  . 

is_gate(gt  (_. - ))- 

gate_signals  (gt  (X - )  ,  [X]  )  . 

gate_signals  (Glist.  [] ) 

Glist  = .  .  [F.[]]- 

gate_signals  (G1 ist . Sigs)  : - 

G1  ist  =  .  .  [F .  [G  I  Gs]  ]  . 
gate_signals (G.X) . 

Newglist  .  [F.Gs]. 

gate_signals  (Newglist . Y) , 
append (X.Y. Sigs)  . 


path_in_glist  (Gate. Signals. Prev. Gate. Gate)  : - 
Gate  [gt.Prev. Type. Size] . 

\+  rsciriber  (Prev .  Signals)  . 

path_in_glist  (series  ([(G, Order)  IGs])  .Signals, Prev. series  (  [(Nevg, Order)  |Gs])  .G 
path_in_glist  (G. Signals. Prev, Newg, Gate)  . 


path  in_glist  (series  ( [G|Gs] ) . Signals, Prev. series ( [GjNewgs]  )  .Gate) 
path_in_glist (series (Gs) . Signals, Prev. series (Newgs) .Gd^e) 


path_in_gl ist  (parallel ( [G | Gs] ) , Signals . Prev . P , Gate) 

path_in_glist(G. Signals, Prev, P, Gate) . 


path_in_glist  (parallel ([GIGS]) .Signals. Prev  P. Gate) 

pa'th._in_gl ist  (parallel  (Gs)  ,  Signals , Prev ,P , Gate)  . 


paths_fr oin_input  (  [Ghs  I Env]  .Sig, Paths) 
paths_to  (Sig,Ghs,P2)  . 
continuation (Env ,P2 .Paths) . 


paths_to  (Sig.  Ghs .  [  []  ]  ) 

is_input (Sig, Ghs) , 
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paths_to (Sig,Chs,Flatp) 

paths (Chs , Path_l ist) , 
chocse_paths_to  (Sig,Path_list,P)  , 
flatten  (P.Flatp)  . 

choose_paths_to  (_,  [],[])• 

choose_paths_to  (Sig.  [Pathrec  |Ps]  ,  [P  |Rest]  )  :  - 

is_path_to (Sig, Pathrec)  , 

I 

•  / 

strip  (Pathrec, P)  , 
choose _paths_to  (Sig, Ps , Rest)  . 
choose_paths_to  (Sig,  [P|Ps],Rest) 

choose_paths_to  (Sig, Ps , Rest)  . 

strip  (path  ( _ P)  ,P)  . 

%  continuation  takes  the  existing  path  list  and  moves  up  the  environment 
%  stack  until  it  finally  makes  it  to  an  input  to  the  whole  kitten  kaboodle 
continuation (  [] , Paths, Paths)  . 
continuation ( [Chs |Env] ,Psf, Paths)  : - 
print ( 'made  it'),nl, 
map (extend , Chs , Ps  f , Temp) , 
flatten (Temp, Newpsf) , 
continuation  (Env, Newpsf, Paths) . 

%  extend  takes  a  path,  which  doesn’t  yet  terminate  at  the  inputs  of  the  chs, 
%  and  extends  it  so  that  it  does  terminate  at  an  input 
extend  (Chs , Path,  [Path]  ) 

input_terminal_of_path  (Path,S)  , 
is_input  (S,Chs)  , 


extend (Chs , Path, Path_l ist) 

input_terminal_of_path (Path, S)  , 
paths_to (S,Chs,P) , 

map  (add_to_end, Path, P,Path_l ist)  . 

%  bleah,  but  this  is  due  to  the  restrictions  of  map 
add_to_end(path(_,Y,Plistl) ,path  (X,_,Plist2)  ,path (X,  Y,Plist3) ) 
diddle (Plistl,Plist2,Plist3) . 

input_terminal_of_path  (path  (X,_,_)  ,X)  . 
is_path_to  (Sig, path  (_,  Sig, _) )  . 
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%  take  a  chs  with  the  signals  filled  in«  and  find  a  critical  path 

find_critical_path (Chs. Cp) 

signals  (Chs , Signals)  , 
inax_output_delay  (Signals. Cp)  . 

niax_output_Gelay  ( []  ,  Entry)  :  - 

iiiake_duiiiiny_signal_entry  (Entry)  . 

Tnax_output_delay  (  [Sig  1  Sigs]  .Entry) 
inax_output_deiay  (Sigs .  El)  , 
bigger_delay  (Sig,  El .  Entry)  . 


%  find  a  sigrials  predecessor  in  the  critical  path 
prev_cp_en try' (Info, Next) 

trigger  (Info, Trig)  , 
tric_chs  (Info.  (Tns)  , 
signal  .i\aine  (Trig,  Sig)  , 

find_s.ignal_entry  (Sig,  Chs, Next)  . 

prev_cp_entry'  (Sig, Next)  :  - 
infc  (Sig,  Info)  . 

prev_cp_entry  (Info , Next)  . 

%  find  the  delay  equations  on  a  path 

delay _equaticn (Cp, Input_delay) 
no_predecessor (Cp) , 

* 

•  / 

delay  (Cp,  Input_delay)  . 

delay _equation (Cp,R  *  C  +  Rest) 
info  (Cp,  Info)  , 
synibclic_r  (Info,R)  , 
symbolic_c  (Info,  C)  , 
prev_cp_entry  (Info, Next)  , 
de lay_equat ion  (Next ,  Rest)  . 


synibolic_r  (I  ,R) 

arg(2,I.R) . 


symbolic_c (I ,  C) 

arg(3. 1  ,C)  . 
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%  this  is  the  electrical  model,  measuring  resistance  and  capacitance 
%  for  a  gate  or  path  of  gates 
resistance  (  []  ,0,0)  . 

resistance  ([t(_,Glist,_)  |Ts]  ,Rl_sym+R2_sym,R) 
resistance  (Ts ,Rl_sym,Rl_num) , 

combine  (gate_resistance,Glist,R2_sym,R2_num) , 

R  is  Rl_numH-R2_num, 


capacitance  (  []  « 0, 0)  . 

capacitance (  [t  (_,  Glist , _)  | Ts] , Cl_sym+C2_sym, C) 
capacitance (Ts , Cl_sym, Cl_num) , 

coinbine  (gate_capacitance, Glist, C2_sym,C2_num)  , 
C  is  Cl_pum+C2_num . 

ifdef ($fast_interpreter , ( 
gate_resistEnce  (gt  (_,  T, S) ,Rg  /  S,Rnum) 
gate_resistance  (T,Rg) , 

Rnur;  is  Rg/S  )  . 

%  otherwise 

(gate_resistance  (gt (_, T, S) ,Rg  /  S,Rnum) 
gate_resistance  (T,Rg)  , 
evaluate  (Rg/S,Rnum) ) )  . 

ifdef ($ fas t_interpreter, ( 

%  hacked  interpreter 

gate_capacitance (gt (_,Type, S)  ,S  *  Ctot,Cnum) 
gate_channel_cap (Type , Cl) , 
gate_drain_cap  (T>^e,C2)  , 

Ctot  is  Cl  +  2*C2, 

Cnum  is  S  *  Ctot) , 

%usual  interpreter 

(gate_capacitance(gt (_,Type,S) ,S  *  Ctot, Cnum) 
gate_channel_cap (Type.Cl) , 
gate_drain_cap (Type,C2)  , 

Ctot  is  Cl  +  2*C2, 
evaluate (S  *  Ctot , Cnum) )) . 

%  these  are  just  reasonable  constants 
gate_resistance  (n, 8)  . 
gate_resistance  (p, 10)  . 
gate_resistance  (interconnect, O)  . 

gate_capacitance  (interconnect, 1) . 
gate_channel_cap  (n, 4)  . 
gate_channel_cap  (p, 4)  . 
gate_drain_cap  (n, 2 .5)  . 
gate_drain_cap  (p,  2 . 5)  . 

interconnect_resistance (0) . 
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interconnect_capacitance (1)  . 


%  preprocess:  get  the  output  capacitance  symbolicly  at  the  beginning 

ifdef ($fast_interpreter , ( 

%hacked  interpreter 

terminal_capacitance  (Chs,  Sig,  C_syni,  C_nuin)  :  - 
outputs  (Chs , O)  , 
assoc (Sig, 0, Entry)  , 
syriibol  ic_cap  (Entry ,  C_syin)  , 

C_r.’jm  is  C_sym)  , 

%  noi  rial  inter preLer 

(terininal_capaci  tance  (Chs,  Sig,  C_sym, C_nuin)  :  - 
outputs  (Chs,0)  , 
asscu (Si g,0, Entry)  , 
syclrolic.  cap  (Entry,  C_sy-n)  , 
evaluate  (C_syiii,  C_nur.) ) )  . 

symbol ic_teriainal_capacitanco  ( []  ,  Output)  :  - 
numeric_output_cap  (Output,  C)  , 
symbclic_cap  (Output,  C)  . 

symbol ic_terr.inal_capacitance  ( [Env  |_]  , Output)  :  - 
sig'nul_name  (Output,  Sig)  , 
output_capacitance  (Env, Sig,Ocap)  , 
subcells  (Env , cells (Cellist) )  , 
sum_gate_capacitance  (Sig,Cellist,Gcap)  , 
simplify (Ocap+Gcap, Cap) , 
sym:Dclic_cap  (Output, Cap)  . 

output_capacitance  (Env, Sig, C)  :- 
outputs  (Env, Os)  , 
assoc  (Sig, Os , Output)  , 
numer  ic_output_cap  (Output ,  C)  , 

I 

output_capacitance  ( _ 0)  .  %because  we  don't  want  to  fail  if  it's  not  an  output 

sum_gate_capac i tance  ( _ []  ,0)  . 

sum_gate_capacitance  (Sig, [Sc  I Scs] , Cl  +  C2) 
subcell_capacitance (Sig,Sc,Cl) , 
sum_gate_capaci tance (Sig, Scs , C2)  . 

subcell_capacitance  (Sig, Chs,C)  :- 
is_input  (Sig,Chs) , 

is_net  (Chs) ,  %  only  case  we  handle  so  far 

I 

•  / 

gates  (Chs ,  G)  , 

signal_gates_cap  (Sig,G,C)  . 

subcell_capacitance( _ 0)  .  %  if  it’s  not  an  input  of  that  cell 
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Eignal_gates_cap(Sig. (G._) ,C)  %  ugliness  for  series  xsistors 

signal_gates_cap  (Sig,G,C) , ! . 

signal_gates_cap(Sig,gt (Sig, Type, Size) , Cl  +  C2) 

syinbolic_gate_channel_cap  (gt  (Sig,  Type,  Size)  ,C1)  , 
gate_capacitance (interconnect , C2) . 

signal. gates_cap (Sig, gt (Other, ,0) 

Sig  \==  Other. 

signal_gates_cap  (Sig,Glist,0) 

Glist=..  [X,  []]. 

signal_gates„cap  (Sig, Glist , Cl  +  C2) 

Glist  =.  .  pc,  [G|Gs]]  , 
sio"al_gates_cap  (Pig,G,Cl)  , 

Ncwglist  =.  .  pc,  Gs]  , 
signal..gates_cap  (fig,  Newglist,  C2)  . 

symbol ic_gate_channel_cap  (gt (_,T, S) ,C  *  S) 
gate_channel_cap  (T, C)  , 

%  combine (Functor ,Glist, Symbolic, Numeric)  --  used  to  sum  or  take  max 

%  of  resistance  or  capacitance,  the  combining  rules  are  the  same 

combine (Functor, gt (Sig, Type, Size) ,Rsym,Rnum) 

P  =. .  [Functor , gt (Sig, Type, Size) ,Rsym,Rnum]  , 
call (P) . 

combine  (_,  ser  ies  (  [] )  /  O)  . 

combine  (Functor ,  series  ( [  (G,_)  |Gs]  )  ,Rl_sym+R_int+R2_sym,Rnum) 
combine  (Functor  ,G,Rl_sym,Rl_num)  , 
combine  (Functor ,  series  (Gs)  ,R2_sym,R2_num)  , 

P  =..  [Functor ,  interconnect,R_int]  , 
call  (P) , 

Rnum  is  Rl_num+R2_num+R_int . 

combine  ( _ parallel  (  []  )  ,0,0)  . 

combine (Functor , parallel ( [G|Gs] ) ,Rmax_sym,Rnum)  : - 
combine  (Functor, G,Ri_sym,Rl_num)  , 
combine  (Functor, parallel  (Gs)  ,R2_sym,R2_num)  , 

(Rl_num  >  R2_num  -> 

Rmax_sym  =  Rl_sym, 

Rnum  is  Rl_num; 

Rmax_sym  =  R2_sym, 

Rnum  is  R2_num) . 
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%  utils  --  various  (non -problem-dependent)  utilities 

%  assoc  (X,Y,Z)  :  Z  is  the  member  of  Y  with  X  as  its  first  element 

assoc(X,  [Y|_]  ,Y)  :  -  arg  (1,  Y.X)  . 

assoc (X, [_iYs] , Z)  assoc (X<Ys « Z) . 

member  (X ,  [X  1  _]  )  . 

member (X,  [_iL] )  :  -  member  (X,L). 

%  var_jiiember  can*t  use  unify,  which  would  bind  the  variables  by  mistake 
var_meniber  (Var ,  [Varl  I  _]  )  Var  =  Varl,  !. 
var_member (Var,  [_|Vars] )  : -  var_member (Var,Vars)  ,  ! . 

max (A, B, A)  A  >=  B. 

max(A,B,B)  B  >  A. 

min (A, B, A)  A  =<  B. 

min  (A,  B,  B)  :  -  B  <  A. 

cpu  :  - 

N  is  cputime, 

print ('cpu  time  is  '), 

print (N) , 

nl, 

t 

print_list  (  [j )  • 
print_list ( [L ILs] ) 
print (L) , 
nl, 

print_list  (Ls)  . 
sum  (  []  ,  0)  . 

i f de  f ($  f ast_interpreter , 

(sum([XlXs]  ,Tot)  :- 
sum  (Xs ,  Sub)  , 

Tot  is  X  +  Sub) , 

%  otherwise 
(sum(  [X|Xs]  ,Tot)  :- 
(number  (X)  , 
sum  (Xs  ,  Sub)  , 

Tot  is  X  +  Sub; 

var (X) , 

sum  (Xs  ,  Sub)  , 

Tot  is  Sub) ) ) . 

max  (  []  ,0)  . 
max(  [XjXs]  ,Max) 

max (Xs,M)  , 
max  (X,M,Max)  . 
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mcip  V—'  LJ  '  LJ  /  • 

map (Functor .  [L|Ls] , [Newl jNewls]) 

P  [Functor,L.Newl] , 

call  (P)  , 

I 

•  0 

map (Functor ,Ls,Newls) . 

map  (_, _ []'[])•  ,  ,  ,  V 

map  (Functor ,  Args ,  [L|Ls],  [Newl  |Newls]  ) 
p  =..  [Functor.Args.L.Newl] , 

call  (P)  , 

»  ^ 

map  (Functor ,Args,Ls,Newls)  . 

range  (Lo ,  Lo ,  _)  . 
range  (Lo,N,  Hi) 

New  is  Lo+1. 

New  =<  Hi, 

I 

•  » 

range  (New, N, Hi)  . 

%  another  thing  that  should  be  an  operator,  by  the  way 
^s(X,X)  X  >  O,  !  . 
abs(X,Y)  Y  is  -X, ! . 

y  flatten  a  list  (i.e.,  put  sublists  into  the  main  list) 

flatten  ([],[])• 
flatten ( [X  I Xs] , Res) 
flatten  (Xs ,  Ten^))  , 
append  (X,  Ten^s, Res)  . 

%  a  particularly  ugly  n-squared  algorithm  for  removing  duplicates  from  the  list 
remove_dupes  (  []  ,L,L)  . 

remove_dupes  (  [X  I Xs] ,L, Result)  :■ 

member  (X,L) , 

I  ^ 

remove_dupes (Xs,L, Result)  . 

remove_dupes (  [X  I Xs] , L, Result) 

remove_dupes  (Xs,  [X|L] .Result) . 

append  (  []  .Result  .Result)  . 
append([X|Xs]  .Y,  [X|Ten?5]) 
append  (Xs,Y, Temp)  . 

apply_to_each  (F ,  []  )  . 
apply_to_each  (F ,  [Li Ls]  )  :  " 

P  =.  .  [F.L], 
call  (P) , 
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apply_to_each (F.Ls)  . 

apply_to_each  (F,Arg,  []  )  . 
apply_to_each(F,Arg,  [L|Ls] ) 

P  =.  .  [F,Arg<L]  . 
call  (P)  , 

apply_to_each  (F ,  Arg,  Ls)  . 


f f  : -  put  (12)  . 
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/*  symbolic  cathematics ;  evaluating,  simplifying,  and  taking  derivatives 

of  e^ruations.  Also,  collecting  all  the  variables  in  a  given  equation 

%  evaluate  a  symbolic  equation. 

evaluate (X, X) 

number  (X)  , 


evaluate (X,  K) 

var  (X)  , 

I 

•  / 

miniErum_gate_size  (K)  . 

evaluate (S+Kcd , Z)  : - 
(var  (Mod)  -> 

evaluate  (S , Z) ; 
evaluate  (S,R1) , 
evaluate (Mod,R2) , 
Z  is  R1+R2) , 


evaluate (X, Result) 

X  .  [Cp,Argl,Arg2]  , 
evaluate (Argl,Rl)  , 
evaluate  (Arg2,R2)  , 

Y  =.  .  [Op,Rl,R23  , 
Result  is  Y, 


%  equation  simplifier .  Doesn't  even  worry  about  the  distributive  law: 

%  speed  is  the  key.  The  main  purpose  of  sinplify  is  to  get  rid  of  zeros 
%  being  added  in . 

simplify  (Exp ,  Exp)  :- 
number (Exp) , 


sinplify  (Exp,Exp)  :- 
var (Exp) , 


sinplify  (Exp,Res) 

Exp  =.  .  [C)p,Argl,Arg2]  , 
simplify  (Argl ,Newl)  , 
simplify  (Arg2 ,New2)  , 

combine _ sinplif ied_terms  (Op,Newl,New2,Res)  . 

%  combine_sicplifiedd_terms  is  where  the  zeroes  (and  other  constants)  are 


Dec 
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%  removed  if  possible 

combine_sinplified_terms (Op, Identity, Res, Res)  : - 

left_identity (Op. Id) , 

Id  =  Identity, 

; 

combine_siinplif ied_terms  (Op, Res ,  Identity , Res)  :  - 
right_identity (Op. Id) . 

Id  =  Identity. 

I 

conibine_siinplified_ternis  (  *  , Argl,Arg2,0)  .  a  "nullitv"  like 

(Aral  =  O  •  %  could  be  more  ggeneral.  with  a  nullity  line 

Arg2  =0)!  %  "identity",  but  why  bother? 

! 

combine_siinplif  ied_terms  ( '/'  ,Argl,_,0)  .  - 

Argl  ==  0, 

I 

combine_simplified_terms (Op.Argl,Arg2,Res)  : - 

number  (Argl)  . 
number  (Arg2)  , 

t 

p'  =  . .  [0p.Argl,Arg2]  , 

Res  is  P.  „  T,  V 

combine_simplified_terms  (0p.Argl,Arg2,Res)  :  - 

Res  =..  [Op,Argl,Arg2] . 


le  f t_ident i ty  ( '  +  ' , 0)  . 
left_identity  ( ' * ' . 1) • 

right_identity  ( '  + * ,0) . 
right_identity(’-' ,0) . 
right_identity (’*',!)  - 

right_identity  ( '/' ' 1)  • 


•/  symbolic  deriviatives.  This  could  be  data-directed  storing  the 
y  proper  information  for  each  operator,  but  that  ^  . 

V  an  attempt  would  still  need  to  be  made  to  unify  with  each.  This  is 
%  very  sloS  and  special-purpose  right  now;  however,  the  equations  will 
%  not  have  any  weird  operators  in  them 

»/  some  atteupt  is  made  to  avoid  what  the  symbolic  math  people  call 
i  "intermedSte  expression  swell"  by  being  somewhat  intelligent.  That  s 
%  why  all  the  ind^endence  checks.  However,  with  a  large  equation,  these 
%  checks  wind  up  taking  a  lot  of  time 


deriv(Eq.Var,l)  Eq  =  Var ,  !  . 

deriv(Eq,_,0)  :  -  nunber  (Eq)  .  !  • 
deriv(Eq,Var.O)  var(Eq),Eq  \ —  Var 

deriv  (U+V,X,E>v) 

independent (U,X) , 


; 
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deriv  (V,X,Dv)  , 

t 

der  iv  (U+V,  X ,  Du+Dv) 

deriv (U,X,Du) , 
deriv (V,X,Dv)  , 

; 

deriv (U*V,X, Res) 

deriv  (V,X,I>v)  , 

(independent  (U, X)  -> 

Res  =  U  *  Hv ; 
deriv (U,X.Du) , 

Res  =  Du*V  Dv*U)  , 

j 

deriv (U/V, X. Res) 

independent (V, X) , 

(nurier  (U)  ->  Res  =  0; 

deriv (U,X,Du) , 

Res  =  Du/V) . 

I 

deriv (U/V,X,MinU*Dv/(V*V)) 
number  (U)  , 

MinU  is  -U, 
deriv  (V,X,Dv) , 

! 

deriv  (U/V^'x,  (V*Du-U*Dv)  /  (V*V) ) 
deriv (U,X,Du)  , 
deriv (V,X,Dv) , 

; 

deriv  (U-V,  X , E>u-Dv)  :~ 
deriv  (U,X,Du)  , 
deriv (V,X,Dv) , 

; 

%  take  all  the  partial  derivatives  of  the  equation 

partials (Eq, Vars,Partials) 

map  (partial , Eq,Vars,Partials)  . 

%  should  be 

%  map (deriv, Eq,Vars,Big_partials)  , 

^  map (simplify , Big_partials,Partials)  . 

%  but  since  there's  no  TRO  it  won't  work  like  that. 

partial (Eq, Var , Partial) 

deriv (Eq,Var,Tenp) , 
simplify  (Temp, Temp_partial) , 

%  keep  the  variables  right 

asserta  ($this_partial  (Eq,  Var  ,Ten55_partial) )  , 
fail . 

partial (Eq, Var, Partial) 

retract  ($this_partial (_,_, Partial) )  , 
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%  check  if  an  equation  is  independent  of  a  variable 
%  does  NOT  appear  in  the  equation 

independent (E>p - Var )  : - 

number (Exp) . 

1 

independent (Exp , Var) 

var  (Exp)  , 

Exp  \=  Var, 


--  i.e.,  if  that  variable 


independent  (j:.xp,  Var) 

\+  var  (Exp)  , 

Exp=..  [_,Argl,Arg2] 
independent (Argl , Var ) 
independent (Arg2 , Var ) 


%  collect  all  the  variables  of  an  equation 

collect_vars(Eq,Vsf,Vsf)  number  (Eq)  ,!  .  , 

collect_vars(Eq,Vsf,Vsf)  var  (Eq)  , var^ember  (Eq, Vsf)  ,  .  . 

collect_vars(Eq,Vsf,  [EqIVsf])  var  (Eq)  ,  !  . 

coll€Ct_vars (Eq, Vsf ,Vars) 

Eq  =.  .  [_.Argl,Arg2]  , 

collect_vars (Argl,Vsf ,V1) , 
collect_vars (Arg2 ,V1, Vars) , 
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/*  STRUCTS :  data  structure  access  functions 

of  course,  these  should  be  macros,  but  you 


can't  do  that  in  cprolog 


There  are  two  kinds  of  chs's:  the  initial  4- field  chs,  and  the 
local  6- field  chs.  the  4- field  chs  looks  like 


chs  ( 

Name 

inputs ( [Input list] ) 
outputs (  [Output list] ) 

Subcells) 

where  S'.bcells  is  either 

cells ( [Subcelllist] ) 
or  t (Source, Gatelist, Drain) 

the  6- field  ci;s  is  a  slight  extension  of  that 


chs  ( 

Name 

inputs ( [Inputlist] ) 
outputs  ( [Outputlist]) 
Subcells 
[Pathlist] 
[Signal_list] ) 


chs_name  (Chs ,  N)  :  - 

arg(l,Chs,N)  . 

inputs (Chs , X) 

arg  (2 , Chs , inputs (X) )  . 

outputs (Chs ,X) 

arg (3, Chs, outputs  (X) )  . 

Eubcells (Chs ,X) 

arg (4, Chs , X) . 

paths (Env,P) 

arg (5, Env,P) . 

signals (Chs , Sigs) 

arg (6 , Chs, Sigs) . 


procedures  are  used  both  to  test  and  generate  . 


/*  "is" 
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is_input  (X.Chs)  will  instantiate  X  in  turn  to  each  input  of  the 
or,  if  X  is  instantiated,  fail  if  it  is  not  an  input 


chs 

V 


is_input (X, Chs)  : - 

inputs (Chs, I) , 
assoc  (X,  I ,  _)  . 


is_output {X, Chs) 

outputs (Chs , O) , 
assoc  (X,  O,  _)  . 

is_subcell  (t  (S ,  G, D)  ,  Chs)  :■ 

su'^;''el Is  (Chs ,  t  (S ,  G, D) )  . 

is_suhcel 1 (C, Chs) 

s'ubcells  (Chs,cr>lls  (Cs) )  , 
menibsr  (C,  Cs)  . 

is_signal  (Chs,X) 

signals  (Cns ,  Sigs)  , 
member (X,Sigs) . 

is_net (Chs) 

subcells  (Chs,  t  (_,_,_) )  • 

/*  outputs  are  of  the  form 

out (Signal , Cap, Symbol ic_capacitance) 

inputs  are  ^ 

in (Signal , Delay)  / 

signal_name (O, Sig) 

arg(l,0,  Sig)  . 

output_signals (Chs,X) 

outputs  (Chs,Y) , 

map (signal_name , Y, X) . 
input_signals (Chs , X) 

inputs (Chs , Y) , 

map (signal_name,Y,X)  . 

input_de lay (Input , Delay) 

arg (2, Input, Delay) . 

numeric_output_cap  (Output, C)  arg(2,Output,C)  . 

synbolic.cap (Output, Eq)  arg (3, Output, Eq) . 

%  find  the  input  delay  of  a  signal  in  a  Chs 

input_delay (Signal , Chs , Delay) 

inputs  (Chs , I ) , 
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assoc (Signal , I , Temp) , 
input_delay (Temp,Delay) . 

source (Chs , S) 

subcells  (Chs,  t  )  . 

drain  (Qis,D) 

subcells (Chs, t (_,_,D) )  . 

gates (Chs, G) 

subcel Is (Chs , t (_, G, _) )  . 


y  a  delay  entry  looks  like 

%  delay_entry  (Signal, Delay, Prev, Path) 

delay (Signal, Delays, D) 

delay_entry (Delays , Signal ,D, _,_)  . 

delav  entry  (Delays ,  Signal , Delay , Prev,  Path)  :  “ 

assoc (Signal, Delays, delay_entry  (Signal, Delay, Prev, Path))  . 


in_gate(X,T) 

in_gate  (X,T,_)  , 


in_gate (X, t (S , Glist ,D) , G) 

Glist  =. .  [_/L] , 

in_glist (X,L,G) . 


in_glist(X,  [gt(X,Y,Z)  I_] 
in_glist(X, [Glist].]  ,G) 
Glist  = . .  [_, L]  , 

in.glist (X, L, G) , 


,gt(X,Y,Z))  !• 


%  note  that  this  won't  match 


in_glist(X,  [YjYs]  ,G) 

in.glist  (X,  Ys,G)  . 


gt  (Sig,Type) 


gate.size (S, S) 

number (S) . 

gate.size (S,K) 
var (S) , 

I 

miniii:um_gate_size  (K)  . 


%  !  handle  both  integers  and  reals 


gate.s  ize  (S-*-Kod ,  X) 
var  (Mod)  , 
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gate_size  (S,X)  . 
gate_size  (S-^Kod,X) 

nuni^r  (Mod)  , 

gate_size  (S,  Size)  , 

X  is  Size+Mod. 


ininiinuir_gat:e_size  (2)  . 

5igge^®lSf5lreJ?Tyiggerj  arg(l,lnfo_r^,^lgge>-) 

Stg!chs(Info_rec,Chs)  arg(4,Info_rec.Ch£)  . 

sig_delay  (sig(_,Delay,_)  , Delay)  . 
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^  user  print  functions 

portray (chs (N , 1 , 0, cells (C) ) ) 

print  (’chs  ' ) ,print (N) .nl . 
print  (' inputs  =  ') .print (1) ,nl, 
print  ('outputs  =  ’ ) 

print  ('subcells  are  ' ) .print_list (C)  . 

portray (chs  (N. 1 . 0,t  (S. G,D) ) ) 

print  (t  (S.G.D) )  . 

-L  the  six- field  local  chs  also  needs  a  portray  function 

portray  (chs  (N.  1 ,0,  cel  Is  (C)  )  .  - 

print ('chs  ') .print (N) .nl . 
print  (' inputs  =  ') .print  (1) .nl. 
print  ( 'outputs  =  ')  .print  (O)  .nl , 

fva’- (P)  ->  true ; pr int  ( 'paths  are  )  .print_list  (  ) )  . 

va-  S  ->  trueiprint'clignals  are  )  ,nl,prlnt_Ust 
print  ('subcells  are  ' ) .print_list  (C)  . 

portray  (chs  (1*M  ,  0.  t  (S .  G.  D)  .  — '  — )  )  • 

print  ('net  ') .print (N) .nl . 
print  (' inputs  =  ') .print  (1) .nl. 
print  (' output s=  ') .print (O) .nl . 
print  (t  (S.G.D) )  . 

print_cp (Entry) 

print  ('Node  ). 
signal_nanie  (Entry .Name)  , 
print (Name).  ^ 

print  ('  is  driven  at  ), 
sig_delay (Entry. Delay) . 

print  (Delay) , 

(info  (Entry.  [])  ->  nl; 

info (Entry. Info) . 

trigger  (Info, Trig) .  .  e.  / .  * 

print ('  via  ’) .print  (Trig) .print (  after 

prev_cp_entry (Info. Next) , 

print_cp (Next)  ) . 


(S)), 


)  .nl. 
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^  make  chs :  take  a  standard  every-day  chs,  and  make  it  into  a  local 
y  data  structure  with  all  the  appropriate  fields.  Essentially, 

%  what  needs  to  be  done  is  to  add  the  fields  for  paths  and  signals, 

and  the  extra  delay  and  info  fields  in  the  inputs  */ 

make_lds  (Chs  ,Lds) 

make_struct (Lds) , 
chs_name  (Chs ,  Name)  , 
chs_name (Lds, Name) , 
inputs (Chs, Inputs) , 

map (add„input_fields, Inputs, Newin) , 
inputs  (Lds, Newin) , 
outputs  (Chs, Outputs) , 

map  (add_output_fields, Outputs, Newouts)  , 
outputs  (Lds, Newouts) , 

(is_pri7r’itive  (Chs)  -> 

(subcclls (Chs ,Net) , 
subcells (Lds, Net) ) ; 

%  otherwise 

subcells (Chs, cells (Cells) ) , 

map (make_lds , Cel Is ,Newsubs)  , 
subcells (Lds, cel Is (Newsubs) ) )  , 

; 


make_struct  (chs  (_,  _/  _/  _) )  • 

y  who  knows  how  many  fields  there  will  be  there  to  begin  with? 

%  It  could  be  1,  2,  or  3.  Make  it  3  in  the  Ids. 

add_input_fields  (in  (Name, Delay,  Info) ,  in  (Name, Delay,  Info) )  . 

%  if  it  has  a  delay  there,  that  can  only  mean  it  s  an  input  to  the 
%  whole  circuit. 

add_input_fields  (in  (Name, Delay)  ,  in  (Name, Delay,  []  ) )  - 
add_input_f  ields  (in  (Name)  ,  in  (Name, - ^) )  . 


add_output_fields (out (Name, Cap) , out  (Name , Cap ,_) )  . 
add_output_f  ields (out (Name) , out (Name ,_,_) )  . 

pp  (L)  ;  - 

chs  (C)  , 

make_lds  (C,L)  , 
make_signals  (L,_)  , 
make_paths  (L ,  _)  , 
make_outputs  (  []  ,  L)  . 

chs  (C)  C  =..  [chs, call  (C)  . 

%  the  badly-named  "make_sizes"  collects  the  sizes  of  all  the  primitives 
%  and  crams  them  into  a  vector 
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make_sizes (L. S) 

is_priTnitive  (L)  , 

f 

•  t 

priE.it.ive_sizes  (L,  S)  . 

iiiake_sizes  (L,  S)  :■ 

subcells  (L, cells (C) )  , 
map  (ciake_sizes,C,NestedS)  , 
flatten  (NestedS.S) , 

assert  (sizes_to_try (S) )  . 

pr imit ive_sizes (L , S)  : - 
is_r.ec  (L)  . 
ne t_gl  i  s  t  (L ,  G)  , 
glist_sizes (G, S) . 


net_glist (C.U)  : 

subcel Is  (C,  t  (_>  G,  _) )  - 

%  remember  that  a  glist  may  be  a  series  or  parallel  connection  of  gts 

glist_sizes (ct S) ,  [S]  )  :■  !• 

glist_sizes (Glist , S) 

Glist  =..  [_, Gates], 

map  (cl ist_sizes , Gates , Sizes) , 
flatten  (Sizes,  S)  . 

y  this  handles  the  case  of  series  gates  with  orders  attached 
glist_sizes ( (G, _) , S)  : -  glist_sizes (G, S) . 


no_predecesscr (Cp)  : ■ 

info  (Cp,  [] )  , 

f 
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%  random  number  generation:  I  stole  this. 


File  : 
Author  : 
Updated : 
Purpose : 


/usr /I ib/prolog/random 
R .A. O'Keefe 
27  October  83 

to  provide  a  decent  random  number  generator 


in  C-Prolog. 


This  is  algorithm  AS  183  from  Applied  Statistics^  I  also  have  a  C 
version.  It  is  really  very  good.  It  is  straightforward  to  make  a 
version  which  yields  15-bit  random  integers  using  only  integer 
arithmetic . 


'  $rstate ' (27134,  9213,  17773)  . 

getrand ( ' $rstate ' (X , Y , Z) )  : - 

'$rstate' (X,Y,Z) . 


%  initial  state 
%  return  current  state 


setrand ( ' $rstate ' (X, Y, Z) )  :- 

integer (X) ,  X  >  O,  X  <  30269, 

integer  (Y) ,  Y  >  O,  Y  <  30307, 

integer (Z) ,  Z  >  O,  Z  <  30323, 

retract  ( '  $rstate '  ( - .-))  > 

asserta ( '$rstate' (X,Y,Z) )  ,  !. 


%  random (R)  binds  R  to  a  new  random  number  in  [0.0, 1.0) 


random  (R)  :  -  ^ 

retract (' $r state  (AO, BO, CO))  , 

A1  is  (AO* 171)  mod  30259, 

B1  is  (BO* 172)  mod  30307, 

Cl  is  (C0*170)  mod  30323, 

asserta ( ' $rstate ' (A1 , B1 , Cl) )  , 

T  is  (Al/30269.0)  +  (Bl/30307.0) 
R  is  T- floor  (T) ,  ! . 


(Cl/30323.0) , 


*/  random  int(L,  U,  R)  binds  R  to  a  random  integer  in  [L,U) 

%  when  L"and  U  are  integers  (note  that  U  will  NEVER  be  generated), 

random_int (L,  U,  R)  : - 

integer (L) ,  integer (U) , 
random  (X)  ,  !  , 

R  is  L+ floor ( (U-L) *X) . 

»/  random (L,  U,  R)  binds  R  to  a  random  real  in  [L,U) 

%  when  L  and  U  are  numbers  (note  that  U  will  NEVER  be  generated)  , 


random  (L,  U,  R) 
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number  (L)  ,  number  (U)  , 


random  (X)  ,  ! , 

R  is  L+ ( (U-L) *X)  . 
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%  minimize  an  equation  --  set  it  below  a  given  delay 
minimize  (Eq/  Delay ,  Vans ,  Constraints ,  Deri  vs) 

%  a  variable  is  not  allowed  to  be  less  than  the  corresponding  constraint 

minimize  (Eq,  Delay  ,Vars ,  Constraints ,Derivs , Result)  :  - 
length  (Vars, Length)  , 
minJlinit_con figuration  (Vars)  , 
repeat, 

get_conf iguration (Vars) , 

Result  is  Eq, 

print ( 'result  is  ‘ ) ,print  (Result) ,nl , 

(Result  <  Delay  -> 

true ;  .  _  .  . 

inin_new_con figuration  (Length,  Vars ,  Constraints, Derivs)  , 

fail) . 


min_init_conf iguration (Vars) 
map (zero , Vars , New)  , 

asserta  ($min_config(New) )  , 
j  , 


zero  ( _ O)  . 

get_conf iguration  (Vars) 

retract  ($min_conf ig (Vars) )  , 

print  ('New  config  is  ') ,print (Vars) ,nl, 

I 

min  new_con figuration  (Length, Vars, Constraints, Derivs) 

build_new_config (Length, Vars, Constraints, Derivs, New) , 

asserta  (§min_config (New) ) , 


bui ld_new_con  f i g  (Length , Vars , Constraints , Der i vs . New) 
evaluate_derivs (Derivs,Num_derivs,0, Sum) , 
Avg  is  Sum/Length, 

normalize (Avg, Vars, Derivs, Constraints, New) . 

evaluate_der ivs  ([]/[]» Sum,  Sum)  . 
evaluate_derivs([D|Ds] ,  [N|Ns] ,Ssf,Sum) 

N  is  D, 

Ten^J  is  -N, 
max (O , Temp , Es t ) , 

New_sum  is  Est  +  Ssf, 

!  ^ 

evaluate_der ivs (Ds , Ns ,New_sum, Sum) . 
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normalize  (Sum,  [],[]/[].[])  •  tx 

normalize  (Avg,  [V|Vs]  ,  [Dps]  ,  [CjCs],  [NlNs]) 
Desired  is  V  -  D/Avg, 

(Desired  <  C  -> 

N  is  C; 

N  is  Desired)  . 


I 

normalize  (Avg, Vs, Ds,Cs, Ns)  . 


